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Dear Readers, 

It’s getting colder and rainy outside. We choose the computer more 
often than a stroll after work. Because of the weather we choose 
a Chat with friends via skype, facebook, or other instead of going 


out. We are lazy and choose to rely on our Internet connection? 


No wonder that even Hy, BSD Magazine's articles published this 

month oscillate e around netv ork topic too. As far as the network is 
SO an Te] routing is needed as well. 

sting wireless adapters for BSD systems. 

Ou about automating deployment of FreeBSD 

3mMs — SO we could afford to be lazy during the 


In Get Started section you will find a nice intro article titled 
“Network Concepts, Routing and Firewalls”, which is intended not 
only for beginners, but also for those who would like to remind 
themselves about the basics.and ground their knowledge. 

The first “How To” article is about installing and running FreeBSD 
as a NAT instance in Amazon Virtual Private Cloud. It’s a really 
good and practical tutorial. We are sure you will learn a lot from 
this one. | can only tell you that the reviewers enjoyed it. 

Furthermore; you can read the second part of Luca’s PostgreSQL 
indexing, where he will show you some examples, which he has 
tested on a PostgreSQL 9.1 cluster. 

We close the issue with Rob’s third part of his series “FreeBSD 
Enterprise Search with Apache Solr”. This time he will teach you 
how to search for internal or external websites and effectively 
manage the results on a large scale. 

Please remember to send us your comments and opinions on 
the articles you've read and the topics you would like to see covered 
in the magazine. Your opinion matters and helps us to do our job 
well. Never think otherwise! 

Well, this is all for now. | wish you a good read! 


Patrycja Przybytowicz 
Editor of BSD Magazine 
& BSD Team 
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What’s New 


OG NETGEAR Universal Wifi Adapter 
By Andrew Gauld 

The trend towards increased internet connectivity of me- 
dia devices (TV's, gaming consoles, DVR’s) has brought 
a work-around for one of few my frustrations with BSD 
operating systems — the limited support for newer wire- 
less adapters. Many of these media devices have an eth- 
ernet port, but no way to attach a wireless adapter. Sev- 
eral companies have stepped up to this opportunity and 
have created universal wireless adapters that connect to 
the ethernet port rather than an expansion port. Since the 
device connects to the ethernet port, no driver is needed. 
Since no driver is needed, these devices should work with 
BSD operating systems. In this article, | will test Netgear’s 
Universal Wifi Adapter, model WNCE2001. 
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10 Automating the Deployment of FreeBSD 
and PC-BSD Systems 


By Kris Moore 

In PC-BSD 9.x every installation is fully-scripted, due to 
the the pc-sysinstall backend. This backend can also be 
used to quickly automate the deployment of FreeBSD 
servers and PC-BSD desktops using a PXE boot envi- 
ronment. In PC-BSD & TrueOS 9.1 and higher, this func- 
tionality is easy to setup and deploy using the “pc-thincli- 
ent” utility. PXE booting allows you to boot systems via 
the LAN interface, as opposed to using traditional media, 
such as DVD or USB. In order for clients to boot via PXE 
they will need a PXE capable network adapter. 


Get Started 
414 Network Concepts, Routing and Fire- 
walls 


By Anders Berggren 
This article is aimed at anyone who wants to learn more 
about networking, routers and firewalls. We will discuss 
this topic in terms of a BSD/PF firewall/router. 


How To 


418 FreeBSD as a NAT Instance in Amazon 
Cloud 
By Andrey aka vand777 
Amazon VPC lets you launch instances in a virtual net- 
work that closely resembles a traditional network that you 
might operate in your own data center. You place pub- 
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licly accessible servers (for example, web servers, DNS 
server etc.) into a public-facing subnet, and place your 
backend systems (databases, application servers etc.) 
in a private subnet with no Internet access. Instances in 
the private subnet can access the Internet only by rout- 
ing their traffic through a NAT instance in a public subnet. 
This article is intended for beginners wanting to install and 
run FreeBSD as a NAT instance in Amazon Virtual Private 
Cloud (Amazon VPC). 


30 PostgreSQL: Indexes (Part 2) 
By Luca Ferrari 

This article continues the previous one, presenting the 
readers with a few index examples and how the access 
costs are computed by the query planner. All the exam- 
ples shown here have been tested on a PostgreSQL 9.1 
cluster running on a FreeBSD 8.2-RELEASE machine; all 
the example source code are available in a GitHub reposi- 
tory. 


42 FreeBSD Enterprise Search with Apache 
Solr (Part 3) 


By Rob Somerville 

One of the important facets of enterprise search is to be 
able to search internal (Intranet) and external websites. 
On a smaller scale, it is relatively trivial to assemble some 
code in PHP or Perl to pull web pages from a site, extract 
the links from the HTML and then “wash, rinse, repeat”. 
The difficulty arises when we want to index, rank, and ef- 
fectively manage these results on a large scale. Almost 10 
years ago, Apache Nutch was developed as the key tech- 
nology to crawl 100 million webpages, and has proved 
time and again that it is an efficient scalable solution. 
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WHAT’S NEW 


NETGEAR Universal Wifi 


Adapter (WNCE2001) 


The trend towards increased internet connectivity of media 
devices (TV's, gaming consoles, DVR’s) has brought a work-around 
for one of few my frustrations with BSD operating systems — the 
limited support for newer wireless adapters. 


any of these media devices have an ethernet 
\/ port, but no way to attach a wireless adapter. 

Several companies have stepped up to this op- 
portunity and have created universal wireless adapters 
that connect to the ethernet port rather than an expan- 
sion port. 

Since the device connects to the ethernet port, no driv- 
er is needed. Since no driver is needed, these devices 
should work with BSD operating systems. In this article, 
| will test Netgear’s Universal Wifi Adapter, model WNCE 
2001. 

The WNCE2001 is a small unit that connects to 
your computer's ethernet port. For power, you get to 
choose between using an AC wall adapter and us- 
ing a USB port — a great option when power outlets 
are limited. The unit is configured through your internet 
browser. 

The hardware I’ve chosen for this test is the Zotac 
ZBox Plus. This computer does not have an internal DVD 
drive, so | will attach an ASUS external DVD drive, mod- 
el SDRW-08D2S-U, which is powered through a second 
USB port. 

Since the four rear USB ports are now being used by 
the keyboard, mouse and DVD drive, | will power the 
WNCE2001 using the AC wall adapter. 

To test the WNCE2001, | will preconfigure the device 
using a live CD of GhostBSD LXDE 3.0 RC1. 
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Step 1 

| created a bootable CD of GhostBSD LXDE 3.0 RC1. You 
can find download links for GhostBSD images at http:// 
www.ghostbsd.org/. 


Step 2 

| assembled the hardware. | attached the WNCE2001 to 
the ethernet port before | turned the computer on. This 
was to ensure that the ethernet port received an IP ad- 
dress from the WNCE2001 when it turned on the DHCP 
client during bootup. 


Step 3 

| inserted the GhostBSD CD into the external DVD drive 
and turned the computer on. In order to boot up the 
Zbox using the external DVD drive, | had to hit the F11 
key on the keyboard. The Zbox then presented a list of 
medium sources. | selected the external DVD drive and 
hit Enter. 


Step 4 

Wireless configuration: It took several minutes for the 
ZBox to boot up to GhostBSD. When it completed, | 
was presented with the LXDE window manager. | used 
the menu buttons on the bottom panel and looked for a 
virtual terminal application. Upon finding and activating 
LXTerminal, | executed the command ifconfig -a. The 
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output can be viewed in Figure 2. As you can see, the 
network interface reo has been assigned an IP address of 
192.168.1.100. 

As | mentioned before, the WNCE2001 is config- 
ured using an internet browser. | found an icon for the 
browser SeaMonkey on the bottom panel. After click- 


[NETGEAR 


Universal WiFi 
Adapter for Smart TV | 
& Blu-ray- 


WALE ZOO 


| Connect one Ethernal-enabled 
device to your exishing hame 
network and the Internel 


| £*)) 
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Figure 1. NetGear Universal Wifi Adapter (WNCE2001) 


L¥Terminal 


: zl< 
media: Ethernet autoselect 


AST RUNNING, SIMPLEX, ,MULIICAS!> metric 0 mtu 290 
ether 74:fO:6d: 29:00:56 
inet6 feso:: f:fezo:d96%ralO prefixlen 64 scopeid Ox2 
[PERPORAALUOG, AUTO _LINRLOCAL> 
2.11 Wireless Ethernet autoselect mode 1llg 
| = acl 
loo: flags=8049<UP, LOOPBACK, RUNNING,MULTICAST> metric 0 mtu 16384 
opti ons=600009<F205U8, TXCSUM,AXCSUM IPV6, TXCSUM IPVG> 
inmet6 ::1 prefixlen 126 
inet6 feBO::1%loO prefixlen 64 scopeid Ox8 
inet 127.0.0.1 netmask Oxf foo000e 
dé options=21<PE-F0RMLD, AUTO _LIAKLOCAL> 
Peland: flags=68c43<UP, BROADCAST , RUNNING, OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1 
en) 


Figure 2. The ethernet adapter, re0, obtained an IP address via DHCP 
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The BSD Certification Group Inc. 
(BSDCG) is a non-profit organization 
committed to creating and 
maintaining a global certification 
standard for system administration 
on BSD based operating systems. 


@) WHAT CERTIFICATIONS ARE AVAILABLE? 


BSDA: Entry-level certification suited for candidates 
with a general Unix background and at least six months of 
experience with BSD systems. 


BSDP: Advanced certification for senior system administrators 
with at least three years of experience on BSD systems. 
Successful BSDP candidates are able to demonstrate 

strong to expert skills in BSD Unix system administration. 


@ WHERE CAN I GET CERTIFIED? 


We're pleased to announce that after 7 months of 
negotiations and the work required to make the exam 
available in a computer based format, that the BSDA 
exam is now available at several hundred testing centers 
around the world. Paper based BSDA exams cost $75 USD. 
Computer based BSDA exams cost $150 USD. The price of 
the BSDP exams are yet to be determined. 


Payments are made through our registration website: 
https://register.bsdcertification.org//register/payment 


@)_ WHERE CANIGET MORE INFORMATION? 


More information and links to our mailing lists, Linkedin 
groups, and Facebook group are available at our website: 
http://www.bsdcertification.org 


Registration for upcoming exam events is available at our 
registration website: 
https://register.bsdcertification.org//register/get-a-bsdcqg-id 


WHAT’S NEW 


ing on the SeaMonkey icon, SeaMonkey opened to the 
the NetGear Smart Wizard at http:/vwww.mywifiext.com/ 
welcome.htm. On this page, | was prompted to select a 
language. | accepted the default (English) and clicked 
‘Continue’. 

The configuration tool searched for available networks. 
On the next page, | was presented a list of networks from 
which to choose. You can see this list in Figure 3. 

After selecting a network and clicking on ‘Continue’, a 
webpage appeared showing the security protocol used 
by the network. | was prompted for a passphrase. | en- 
tered the passphrase and clicked ‘Continue’. The next 
page informed me that the WNCE2001 was connecting 
to the network and that the process would take approxi- 
mately 2 minutes. Once this was done, a webpage ap- 
peared that telling me that the computer was connected 
to the network successfully. The page also showed the 
network name, security protocol and the passphrase | had 
submitted. | clicked on the button, ‘Finish’ and closed the 
browser. To test for internet access, | started SeaMonkey 
again and entered htto:/www.ghostbsd.org/ in the url bar. 
The home page for GhostBSD appeared. 

| selected the LXTerminal window and executed the 
command ifconfig -a. The IP address for reO had been 
changed to 192.168.1.78. 
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Figure 3. The configuration wizard displays available networks 
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Figure 4. The computer now has wireless internet access 
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Step 5 
Curious as to whether | would have to reconfigure the 
WNCE2001 with every use, even when using the same 
network, | decided to reboot the computer. Just for fun, | 
used a Live DVD of PCBSD 9.0. Booting PCBSD 9.0 on 
the ZBox was uneventful except that the live image was 
copied into memory, which took several minutes. When 
it was done, | was given a choice of KDE, Gnome, LX- 
DE or XFCE. | selected LXDE. When LXDE had loaded, 
| opened the Epiphany web browser. The default page, 
http://www.google.com, appeared. The WNCE2001 did 
not have to be reconfigured after rebooting the com- 
puter. 

lf you ever need to reconfigure the adapter, input http:// 
www.mywifiext.com/welcome.htm into the url bar. This 
was the address of the page that appeared in Step 4 when 
we configured the adapter for the first time. When | en- 
tered it, the first webpage of the NetGear Smart Wizard 
appeared, as in Step 4 above. 


Conclusion 

What have we learned? We have established that the 
use of universal wifi adapters and an ethernet port can 
by pass driver compatibility issues for wireless network 
access. We have also established that the ASUS SDRW- 
O8D2S-U and Zotac ZBox Plus (excluding the internal 
wifi adapter) are compatible with GhostBSD 3.0 RC1 and 
PCBSD 9.0. 


ANDREW GOULD 

Andrew Gould is a Senior Revenue Analyst in the healthcare in- 
dustry. He has been using FreeBSD, PostgreSQL and Python for 
over ten years. 
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Automating 


the Deployment of FreeBSD and PC-BSD® Systems 


In PC-BSD 9.x every installation is fully-scripted, due to the the 
pc-sysinstall backend. This backend can also be used to quickly 
automate the deployment of FreeBSD servers and PC-BSD 
desktops using a PXE boot environment. 


tionality is easy to setup and deploy using the pc- 

thinclient Utility. PXE booting allows you to boot 
systems via the LAN interface, as opposed to using tra- 
ditional media, such as DVD or USB. In order for clients 
to boot via PXE they will need a PXE capable network 
adapter. 


| n PC-BSD & TrueOS™ 9.1 and higher, this func- 


The Initial PXE Setup 

To get started, you will need to have a system with two 
network interfaces running PC-BSD or TrueOS 9.1 and 
a complete ports tree in /usr/ports. If you do not have 
the ports tree installed, you can download it by running 
the command portsnap fetch extract update as root. With 
these pieces in place, open a root prompt and run the pc- 
thinclient command. The first screen you see will look 
something like this: Figure 1. 


Figure 3. Selecting the NIC for DHCPD 
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C6. 199 


Enter “y” to continue, and the following screen will be 
shown: Figure 2. 

In this case you are going to be setting up a PXE instal- 
lation server, so enter “i” to continue. (The “r’ option can 
be used to make your system a X thin-client server. More 
information about this can be found on the wiki page at 
the end of the article). After selecting your type of PXE 


system, the thin-client wizard will then begin to build the 


Figure 4. The finish screen for pc-thinclient 
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Figure 5. Booting the client from PXE to the ThinClient Menu 
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net/isc-dhcp42-server port. You will most likely only need 
the default port options, and can hit enter at any confirma- 
tion screens that appear. Once the port has finished in- 
Stalling, the thin-client setup will prompt you again for the 
PXE network interface to use: Figure 3. 


Listing 1. Sample configuration of pc-sysinstall for a PXE client 


#Sample configuration file for an installation using 


PC-—sVsinistall 


installMode=fresh 
installinteractive=no 


hostname=examplesystem 


# Set the disk parameters 
disk0=ada0 

partition=all 
bootManager=none 


commitDiskPart 


# Setup the disk label 

# All sizes are expressed in MB 

# Avail FS Types, UFS, UFS+S, UFS+J, ZFS, SWAP 
# Size 0 means use the rest of the slice size 
disk0-part=UFS+SUJ 1000 / 

disk0-part=SWAP 2000 none 

disk0-part=UFS+SUJ 0 /usr 


commitDiskLabel 


# Set if we are installing via optical, USB, or FTP 
installType=FreeBSD 

installMedium=local 

localPath=/installarchive 

packageType=tar 


installFile=fbsd-release.txz 


# Set the root pass 


rootPass=root 


# Setup our users 

# Setup our users 
userName=kris 
userComment=Kris Moore 
userPass=kris 
userShell=/bin/csh 


userHome=/home/kris 


userGroups=wheel, operator 


commitUser 
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Enter the interface name of the network card you wish 
to use as the PXE interface. This interface will be run- 
ning the DHCP server, and should not be connected to an 
existing network with another DHCP server running. The 
wizard will then finish up the configuration for PXE boot- 
ing, and display a message similar to this: Figure 4. 

Your initial PXE setup is now complete! You may now try 
to PXE boot a client connected on the network interface 
you specified. If the client boots successfully, you should 
be presented with an installation screen like Figure 5. 

By selecting the “install” option you will be presented 
with the example configuration option, which can be used 
to install a basic FreeBSD system. (Below we will look at 
adding your own) By selecting it, you will be asked to con- 
firm one last time that you wish to perform the installation 
using this configuration, and then the installation will pro- 
ceed. In addition to performing installs, you can also ac- 
cess an emergency shell prompt. This can be very useful 
if you have a system which can no longer boot, and you 
wish to access the disk or attempt repairs of some kind. 


Customizing the Installation Scripts 

With your initial PXE configuration finished, you will now 
most likely want to create your own installation scripts. 
The thin-client wizard creates an example installation 
script in the /usr/home/thinclient/installscripts/ direc- 
tory, which is where you will want to place your custom 
scripts as well. Any scripts placed in this directory will be 
selectable as an installation option on the PXE client. 

With the location of the install scripts in hand, lets now 
take a look at the provided pc-sysinstall.example file in 
that directory (Listing 1). 

The included comments in the example make most of 
the functionality fairly self-explanatory. However there 
are a few sections we will take a closer look at custom- 
izing. The first place to note is the installMedium=local 
and localPath=/installarchive options. The pc-sysinstall 
backend supports a number of methods of fetching the ar- 
chive files for installation. In this case we are using the lo- 
cal file fbsd-release.txz, stored In /usr/home/thinclient/ 
installarchive. This directory will appear to PXE clients 


PRO Tip! 

By using the Warden utility in PC-BSD / TrueOS, it is possible 
to setup your install environment inside a jail on the host sys- 
tem. Then when you are satisfied with the configuration, you 
can stop the jail and create the install archive on the host sys- 
tem with the tar command as shown below: 


+ tar Cvlr / sr, homey taimclient/ instal larchive, 


Mya eel i Vee eC WSiey palsy apa i aoe 
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aS /installarchive, allowing you on the host OS to sup- 
ply your own tar archives of any FreeBSD / PC-BSD re- 
lease or configuration you wish. 

After selecting your installation archive options you will 
most likely want to customize the disk layout of the new 
system. In the example provided we are performing a very 
simple full-disk installation to the disk adao, using UFS 
with Soft Updates + Journaling. The resulting installation 
will contain a 1000MB root partition, a 2000MB swap, and 
use the rest of the disk space for /usr as seen Listing 2. 

In addition to UFS support, the pc-sysinstall backend 
can also handle more advanced ZFS disk layouts in a 
similar manner: Listing 3. 


PRO Tip! 

Since every single install with pc-sysinstall is fully scripted, 
a good way to experiment with alternative ZFS layouts is us- 
ing a virtualization tool such as VirtualBox and your PC-BSD in- 
stallation DVD. By running the install GUI, you can customize 
your ZFS disk layout in an easy-to-use manner inside a virtual 
machine. Then when the installation is finished, a copy of the 
saved pc-sysinstall configuration file will be saved onto the in- 
stalled system at /root/pc-sysinstall.cfg. 


Listing 2. Adjusting the UFS disk layout for this thin-client 


#Set the disk parameters 
disk0=ada0 

partition=all 
bootManager=none 


commitDiskPart 
# Setup the disk label 


# Avail FS Types, UFS, UFS+S, UFS+J, ZFS, SWAP 
# Size 0 means use the rest of the slice size 
disk0-part=UFS+SUJ 1000 / 

disk0-part=SWAP 2000 none 

disk0-part=UFS+SUJ 0 /usr 


commitDiskLabel 


Listing 3. Switching to ZFS for the default client FS 
#Setup the disk label 


# All sizes are expressed in MB 

# Avail FS Types, UFS, UFS+S, UFS+J, ZFS, SWAP 

# Size 0 means use the rest of the slice size 
dusk0—part—-4h5 Ue), /Ust),/ var, Kook (mircor: sada) 


commitDiskLabel 


BSD 


MAGAZINE 


References 

Wiki Articles: 
http:/wiki.pcbsd.org/index.php/Thin_Client 
http://wiki.pcbsd.org/index.php/Creating_an_Automated_ 
Installation_with_pc-sysinstall 


Mailing Lists & Forums 
http://lists.pcbsd.org 
http://forums.pcbsd.org/ 


In the example above we have changed our disk lay- 
out from UFS to a single ZFS pool using the entire disk, 
and added the disk drive ada’ to the resulting zpool as a 
mirror device. While this example is fairly simplistic much 
more complex layouts can be created using ZFS, includ- 
ing raidz, dataset options and more. 

One last feature to look at with the thinclient utility is the 
ability to perform completely unattended installations. If 
you plan on only doing fully-automated installations via 
this PXE interface, you may do so simply by creating a 
configuration script named unattended.cfg and placing 
It in the /usr/home/thinclient/installscripts/ direc- 
tory alongside the example. When a client first boots it 
will check for this unattended.cfg file, and if found it will 
automatically use it for installation. Some caution should 
be taken when using this method, since simply plugging 
a PXE booting system into the wrong LAN cable could 
cause it to be re-installed. 


Conclusion 

We have briefly looked at some of the ways PC-BSD and 
TrueOS 9.1 make it easy to setup PXE booting and per- 
form rapid deployment of FreeBSD based systems. We 
have also just begun to scratch the surface of the type 
of installations that can be performed with the pc-sysin- 
stall backend. If you want to do further research into more 
complex installations please take a look at the included 
wiki references on pcbsd.org. We also will be happy to 
continue the discussion with you on the PC-BSD mailing 
lists or forums. 


KRIS MOORE 

Kris Moore is the founder and lead developer of PC-BSD. He lives 
with his wife and four children in East Tennessee (USA), and en- 
joys building custom PC’s and gaming in his (limited) spare time. 
kris@pcbsd.org. 
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Network Concepts, 
Routing and Firewalls 


This article is aimed at anyone who wants to learn more about 
networking, routers and firewalls. We will discuss this topic in 


terms of a BSD/PF firewall/router. 


What you will learn... 

¢ Important networking concepts 
¢ Routing basics 

¢ The PF firewall 


lon, building FreeBSD and OpenBSD based network 

appliances, and the Internet is one of the things that 
fascinates me the most. In my experience, a good un- 
derstanding of the basic principles of IP and networking 
is actually the most important asset when it comes to 
master the Internet. Some readers will surely find some 
of the topics in this article trivial, but | still believe that the 
full comprehension of those trivial facts is key to mak- 
ing people more confident in the area of networking. 
Throughout this article, we will learn about IP, network- 
ing and BSD-based firewalls and routers. Kind of a BSD 
routing crash course. 


work as a product designer at a company called Ha- 


OSI Layers 
Let's start by visualizing the oh-so-important network lay- 
ers. Google the “OSI model” to get the full picture. Be- 
cause | will discuss IP primarily, I'll only talk about layer 2 
and layer 3. When | say “layer 2” you should be thinking 
about Ethernet, hardware (MAC) addresses and switch- 
es. You should also think “broadcast”, because it’s one of 
the fundamental features of Ethernet. Whenever you con- 
nect computers to the same switch, they can broadcast 
freely to each other. 

When | say “layer 3”, you should be thinking IP ad- 
dresses, routers, firewalls and the Internet. Whenever 
you have a network-related problem that you find com- 
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What you should know... 
¢ BSD basics 
¢ Network basics 


plex or diffuse, try to tear the problem apart and put the 
different pieces in categories such as the OSI model’s 
layers. Alright, this started out a bit vague, but please 
read on. 


IP Addresses 
ll head directly to the topic of IP addresses, because it 
connects the topics of layer 2 and layer 3 nicely. 

ll start by describing CIDR notation, which | will use 
throughout this article. Have you seen a net mask such as 
259.259.255.0? You probably have. Anyway, it basically 
says that the subnet it defines has 256 addresses. There- 
fore, it’s a quite common net mask for home or small busi- 
ness networks, with fewer than 250-something devices. 
What it really states is that: “on this layer 2 network, ev- 
eryone with the same IP address except for the last digit 
may speak with each other directly”. In other words, if I’m 
192.168.0.2 and you are 192.168.0.3 we can speak di- 
rectly using layer 2 (a switch; never even considering go- 
ing via a router) because the first three digits of our IP ad- 
dress are the same. However, 192.168.0.2 cannot speak 
directly to 192.168.1.3 on a 255.255.255.0 subnet, be- 
cause their third digit differs. So, what’s CIDR then? Well, 
it's actually just the notation of writing “192.168.0.1/24” 
instead of “192.168.0.1 with net mask 255.255.255.0”. It 
means the same thing, but is a lot shorter. In the same 
fashion, 255.255.0.0 (a subnet with 65536 addresses in 
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it) has prefix length 16. Try Googling “CIDR notation” to 
read more, or find a translation table between net masks 
and prefix lengths. Let’s proceed with a few bullet points 
that we can discuss: 


e IP addresses are assigned to network interfaces. A 
network interface can be physical (Such as an Ether- 
net adapter) or logical (such as a VLAN or tunnel). 

¢ The IP address and the net mask (or prefix length, as 
it's also called) defines the “directly reachable” (layer 
2) network in the routing table. 


For example, let’s say | have a BSD computer/router with 
two network interfaces, and assign 192.168.0.1/24 to one 
of them (the device em1, that | call “LAN”). This can be 
done at runtime by running 


# ifconfig eml 192.168.0.1/24 


| also connect this LAN interface with a switch, which 
has a few computers attached to it. These computers 
has IP address such as 192.168.0.2, .3, et.c. If | “ping” 
one of those computers from my “router”, it will find the 
appropriate interface to send out this ping on by first 
looking at the “layer 2” (directly reachable) part of the 
routing table. This is evident by running: 


# netstat -rn -f inet 
Destination Gateway Flags Refs Use Mtu Prio Iface 


192.168.0/24 link#2 UC 5 0 - 4 eml 


I've actually told the router that the addresses 
192.168.0.0 to 192.168.0.255 (defined by my address 
192.168.0.1 and the prefix length 24) are directly reach- 
able using layer 2 on the LAN interface, and the router 
will not even try layer 3 routing to find the destination. In- 
stead, the router makes an ARP request (trying to find 
the IP address’ MAC address) on the LAN interface, and 
if it gets a response, sends the ping to that MAC address 
out on the LAN interface. This surely sounds trivial to 
many, but is actually very important. We can make a few 
important assumptions from this: 


¢ You should almost never assign two address with- 
in the same subnet to two (or more) different inter- 
faces, because it confuses the computer. Let’s say 
I've assigned 192.168.0.1/24 to one interface (LAN) 
and 192.168.0.10/24 to another interface (WAN). 
They are, as defined by the /24 prefix length, in the 
same layer 3 subnet, and thus in the same ‘“lay- 
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er 2 domain’. Consequentially, the computer/rout- 
er doesn’t know on which interface to send packets 
to 192.168.0.X, because it has two choices. The ex- 
treme case would be to assign the same IP address 
to two different interfaces, which of course, is equal- 
ly forbidden. There are exceptions, but this a good 
rule of thumb. 

¢ You almost never need to create routes to direct- 
ly connected networks. Too many times I’ve seen 
people trying to solve problems by randomly adding 
routes. If you have an address such as 192.168.0.1/24 
on LAN, you dont need to create a route to 
192.168.0.0/24, the BSD kernel should have added a 
proper layer 2 directly connected route for you. 


Routing 

Anything that isn't directly connected in the sense that we 
discussed in the previous section, IP routes are needed. 
The by far most common route is the default gateway, 
which really is 0.0.0.0/0 in network language. The default 
route is added like: 


# route add default 1.1.1.1 


Here are a few important facts. As usual, they don't al- 
ways apply, but are good rules of thumb. 


¢ The most specific route is chosen. If you have 
one default route via your gateway, and a route to 
10.0.0.0/16 via another one (router 2), a packet to 
10.0.1.1 will be sent to router 2. 

¢ All routes has a gateway, and that gateway has to be 
directly reachable using layer 2. That is, if you on- 
ly have an address such as 192.168.0.2/24 (mak- 
ing 192.168.0.X directly reachable) you cannot add 
a route to 10.1.0.0/24 via 172.16.0.1, because your 
computer doesn't know how to speak with 172.16.0.1 
using layer 2. A route to 10.1.0.0/24 via 192.168.0.1 
would be valid, however. 

¢ Routing is uni-directional. Just because you can send 
packets to me, doesn’t mean | have a route back to 
you via which | can send the responses. One exam- 
ple would be a VPN appliance. Say you have a serv- 
er (192.168.0.2), connected via a router (192.168.0.3) 
to the internet. On the same network as the server, 
you places a VPN appliance (192.168.0.3). To this 
VPN appliance a roaming user connects, and get's 
the VPN address 10.0.0.2. The VPN user tries to 
ping 192.168.0.2. The packet travels over the VPN 
to the VPN appliance, which sends it to 192.168.0.2. 
It has reaches it’s destination. The server howev- 
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er, responds by sending a packet to 10.0.0.2 via its 
default route 192.168.0.1 (because neither the serv- 
er nor the router know about 10.0.0.0/16 being be- 
hind the VPN appliance 192.168.0.3) and the packet 
is lost. The roaming VPN user doesnt get a ping re- 
sponses. However, adding a route to 10.0.0.0/16 via 
192.168.0.3 on the server or the router would have 
solved the issue. 


A router typically forwards packets, and in most BSD op- 
erating systems this is enabled by running 


# sysctl net.inet.ip.forwarding=1 


for IPv4. Please note that all of the commands | run has 
to be made permanent by adding configuration to the ap- 
propriate files in /etc. This however is very well docu- 
mented in the respective BSD’s manuals, and not the fo- 
cus of this article. 


Dynamic Routing 

For a long time, routers have typically been hardware ap- 
pliances from companies such as Cisco or Juniper. In re- 
cent years however, open source routing has matured to 
the point of being highly suitable even for demanding in- 
stallations. A modern computer with good network adapt- 
ers can easily handle several hundreds of thousands 
packets per second. 

The by far most common application for BGP routing 
is multi-homing; that is, making your services redundant 
via multiple ISPs. This of course requires redundant con- 
nections, an AS number and provider-independent IP ad- 
dresses, which could be fairly expensive to get hold of. 
Ask your ISP for more information. Once that is man- 
aged however, the configuration is not at all that com- 
plicated. In case you're running OpenBGPD, add to 
bgpd.conf 


AS your-As 

network your-network 

neighbor ISPs-router { 
remote-as ISPs-As 

} 

neighbor another=-ISP- [as 

deny filters... 


Although this is not a copy-and-paste example, it hope- 
fully gives you a glimpse of OpenBGPD's elegance and 
simplicity. The same goes for OpenOSPFD; it’s ospfd. 
conf format is compact and comprehensible, and the 
program itself works like a charm. 
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Aliases Addresses 

In case you ask your ISP for multiple address, they might 
give you either a few addresses, or a whole network which 
you are expected to be the router for via a link network. 
In the first case, where you're just given a few address- 
es, you can add those to your WAN interface (called emo 
in this example) as alias addresses with a all-ones (/32) 
net mask. The reason to use /32 is to avoid adding layer 
2 routes for the addresses. If 1.1.1.2/27 is your primary 
IP address, and you're also given 1.1.1.3 andi.1.1.4 by 
your ISP, these can be added like: 


# ifconfig em0 1.1.1.2/27 
# ifconfig em0 1.1.1.3/32 
# ifconfig em0 1.1.1.4/32 


The PF Firewall 

Most BSDs ship with the packet filter (PF); a “stateful” lay- 
er 3 (e.g. IP) to layer 4 (e.g. TCP) firewall. It’s dual-stack 
by default, meaning that all rules that doesn't explicitly 
specify an address family works for both IPv4 and IPvé6. 
This, along with other nice properties of its configuration 
format, makes it (almost) a joy to create rules for. In fact, | 
like the PF configuration format that much we decided to 
use it straight off in our own firewall software. There are a 
few fundamental concepts which are good to know about: 


¢ The firewall’s ruleset is evaluated every time a packet 
goes in or out of an interface 

¢ Interfaces can be interface groups, hardware ports, 
VLAN interfaces, etc. 

¢ The last matching rule wins, unless a rule is marked 
by the quick keyword (| recommend making rules 
quick) 


| personally prefer to do filtering on interface groups in- 
stead of interface; both for naming them, and grouping 
them. | also find it extremely handy that the interface 
with the default route on it, is automatically added to the 
“egress” group (which can be considered WAN). To add 
the emi interface to a “lan” group, run: 


# ifconfig eml group lan 

Assuming that the WAN interface is in the egress group, 
we could create a simple firewall ruleset performing NAT 
by adding to /etc/pf.conf: 

block. Log 


pass on lan 


pass out on egress nat-to (egress) 
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and apply the changes by running 
+ prctl —£ /etc/pr.cont 


That’s a pretty compact ruleset, don’t you think? Its 
beautiful syntax also makes it very clear just what it 
does. One could add a port forwarding to this setup very 
simply, by adding to /etc/pf.conf 


pass in on egress to port 80 rdr-to 192.168.0.100 
Where 192.168.0.100 could be a web server on your LAN. 
The, in my experience, most common pitfall is to under- 
estimate the importance of the rules’ order. For example, if 
using mostly “quick” rules (in which case the first matching 
rule “wins’”) it’s usually appropriate to move more specific 
rules towards the top, and more general rules towards the 
bottom. Consider the example below 


block Log 
pass quick on lan 
Dass in quick on. lan 6 (wan) port 80 rdr=to 10.0.0.100 


# doesn’t work 


The function of the third rule in the example above would 
be to redirect traffic headed for the WAN interface’s IP to 
an internal server on a DMZ (sometimes referred to as 
“reflection”, being able to access internal resources on 
external addresses from within the network). That won't 
work however, because that rule will never match. The 
rule just above it, the second rule, is more general, and 
will always “win”. The solution would be to switch places 
of the second and third rule. 

Now that you’ve seen some of PF’s elegance, you could 
go on and read OpenBSD’s excellent PF FAQ. The manu- 
al page for pf.conf is also great once you get the hang of it. 


Debugging and Monitoring 
| will not cover debugging in depth, but give you some 
hints and pointers. The manual pages will do the rest. 

As for general network and firewall debugging, trac- 
ing packets can be a very quick way of locating errors. 
Let's say you have a “case” which you expected to work, 
but which doesn't. You might not reach a web server in a 
DMZ, but other servers in the DMZ are responding. You 
don't know what the problem is, or if it’s even in the fire- 
wall. Anyway, start tracing packets. Make sure you have 
a device as initiator. In this case, we want do debug an 
issue only visible from the “outside”. Therefore, use a 
smartphone connected to cellular data and start trying to 
access the web server while having this command run- 
ning on the firewall: 
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# tcpdump -n -i em0 port 80 and host 1.1.1.2 


where emo iS your external interface and 1.1.1.2 is the IP 
youre trying to access from the smartphone. If you see 
traffic, move on the firewall logging 


# tcpdump -n -e -ttt -i pflog0 port 80 


and see if it’s blocked (requires all your block rules to 
have the “log” keyword). If it’s not blocked, move on the 
internal interface where you expect the packets to show 


Up: 
# tcpdump -n -i eml port 80 and host 192.168.0.100 


where emi is your internal interface and 192.168.0.100 
is the IP that you might have done a port forwarding 
to. If nothing shows up, start looking at the firewall ru- 
leset, and make sure that the desired packets are 
in fact matches by your port forward. Perhaps take a 
look at the routing table to make sure that packets to 
192.168.0.X are in fact expected to be sent out on emi. 
And so on. If you however do see packets going out on 
em1, but you don't see the responses, start debugging 
the server. Is it connected? Does it provide any service 
on port 80? Is the server configured with the firewall as 
its default gateway? Otherwise the packets aren't rout- 
ed back. 

Certain debugging and monitoring tasks require you 
to extract information from the system, such as from PF 
or the BGP process. For this, there are handy little pro- 
grams called pfctl, bgpctl, ospfctl, et.c. Google those, 
and you'll find tons of information in manual pages and 
tutorials. 
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as a NAT Instance in Amazon Cloud 


This article is intended for beginners wanting to install and 
run FreeBSD as a NAT instance in Amazon Virtual Private Cloud 


(Amazon VPC). 


What you will learn... 

¢ How to install, configure and update a FreeBSD instance in Amazon 
Cloud, 

- How to configure private and public subnets, routing and security 
groups in Amazon VPC, 

¢ How to set up FreeBSD as a NAT instance between public and pri- 
vate subnets. 


mazon VPC lets you launch instances in a virtual 
A network that closely resembles a traditional network 

that you might operate in your own data center. You 
place publicly accessible servers (for example, web serv- 
ers, DNS server etc.) into a public-facing subnet, and place 
your backend systems (databases, application servers etc.) 
in a private subnet with no Internet access. Instances in the 
private subnet can access the Internet only by routing their 
traffic through a NAT instance in a public subnet. In the con- 
figuration wizard Amazon offers its “native”, Linux based, 
instance as the only choice for NAT instance. However, it 
is possible to configure VPC manually and use FreeBSD 
(or any other operating system) for NAT purposes. 


What we will do 


¢ Configure Amazon Virtual Private Cloud (VPC). 

e Launch a FreeBSD instance in a public subnet. 

¢ Configure route table for the private subnet. 

¢ Configure firewall for the FreeBSD instance. 

¢ Perform basic configuration of the FreeBSD instance. 

¢ Apply security patches and install a custom kernel. 

¢ Setup OpenVPN. 

e Launch a Microsoft Windows instance in the private 
subnet. 

¢ Test Remote Desktop Connection to the Microsoft Win- 
dows instance to prove that our NAT instance works. 
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What you should know... 
¢ FreeBSD basics, 
¢« Amazon Cloud basics. 


As the result of the above steps we will end up with: 


¢ the public (10.0.1.0/24) and private (10.0.2.0/24) sub- 
nets, 
¢ aFreeBSD NAT instance with two network interfaces: 
e the first network interface belongs to the public 
subnet and has the IP address 10.0.1.10, 
¢ the second network interface belongs to the pri- 
vate subnet and has the IP address 10.0.2.10. 
¢ a Microsoft Windows instance in the private subnet 
(IP address 10.0.2.20) which can be accessed (Re- 
mote Desktop Connection) through our FreeBSD 
NAT instance. All outgoing traffic of this Microsoft 
Windows instance goes through the FreeBSD NAT 
instance. 


Step 1. 

Configure Amazon Virtual Private Cloud 

In this step we will create and configure VPC (10.0.0.0/16).: 
public (10.0.1.0/24) and private (10.0.2.0/24) subnets, an 
Internet gateway and a route table for the public subnet. 


Task 1. Create VPC 
Log in into AWS Management Console and then choose 
the “VPC” item from the “Services” menu: Figure 1. 

In the “Navigation” pane, click the “Your VPCs’” link, and 
then click the “Create VPC” button: Figure 2. 
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Type 10.0.0.0/16 in the “CIDR Block” field and then 
press the “Yes, Create” button. 


Task 2. Create public and private subnets 

In order to create public and private subnets, click the “Sub- 
nets” link on the “Navigation” pane, and then click the “Cre- 
ate Subnet” button: Figure 3. You can select a desired avail- 
ability zone as well as specify subnet’s IP address block 
(“CIDR Block” field). You should create 10.0.1.0/24 as your 
public subnet, and 10.0.2.0/24 as your private subnet. 


Task 3. Create an Internet Gateway. 

Create and attach an Internet Gateway (click on the “In- 
ternet Gateways’ on the “Navigation” pane, and then click 
the “Create Internet Gateway” button). After creation, se- 
lect the Internet Gateway in the list and then click the “At- 
tach to VPC” button, select your VPC (10.0.0.0/16) and 
then click on the “Yes, Attach” button. 


Task 4. Create a route table for public subnet 
Create anew route table for your public subnet (10.0.1.0/24): 


) VPC Management Cantole [+ | 
G0 Services ~ Eat snoncut © 
Amazon WAC Console bashbosand 
Region: ‘ 
Svs Gast. vega) © Your Virtual Private Chowd = AWS Service Health 
eee Current Status 
= Amacon VC enables you to create a virtual o Amanen VAS (US Baal = 
nebeork topology = mchading subnets and rote Virginia) 
+ VP. Deane tables - for your C2 rasourees. 0 ao Bod (US Ene 
irgini a} 
VIETUAL FRIETE CLOUDS Click the button below to create a Virtual Private » vie 
’ Your Wes Choad. 
* Subnets Related Links 


* Route Tables : | Get started creating a VPC |) shorted creating a VPC || 
» INDE Gobewars : PIS * VRC Documentation 
+ GHP Opbons Sans * All VP Resaurnes 
» Elastic [Ps * Forum 
» Report an 1tue 


SECURITY 

* Neiwork Ls 

» Seounty Groups 

(ee COOnWELTIONS 

» Customer Gatenays 

* Virtual Private Gateways 


» VPM Conmections 


Figure 1. Amazon VPC Console Dashboard 


Create VPC Cancel xX 


A VPC is an isolated portion of the AWS cloud populated by AWS objects, 
such as Amazon EC2 instances. Please use the Classless Inter-Domain 
Routing (CIDR) block format to specify your VPC'’s contiguous IP address 
range, for example, 10.0.0.0/16. Please note that you can create a VPC no 
larger than /16. 


CIDR Block: (e.g. 10.0.0.0/16) 


Tenancy: Default + 


Figure 2. Create VPC 
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¢ Click on the “Route Tables” on the “Navigation” pane. 

¢ Click on the “Create Route Table” button, select your 
VPC (10.0.0.0/16) from the drop-down list, and then 
click “Yes, Create”: Figure 4. 

¢ This new route table will appear in the list (it will have 
“No” in the “Main” column). Select it in the list, and 
then add a new route (Destination: 0.0.0.0/0, Target: 
your Internet Gateway) in the last line, and then click 
“Add”: Figure 5. 

¢ Click on the “Associations” tab and then associate 
this route table with your public subnet (10.0.1.0/24): 
Figure 6. 


Step 2. Launch a FreeBSD Instance in a Public 
Subnet 

I'd like to take this opportunity to thank Colin Percival who 
creates AMI instances of FreeBSD in Amazon Cloud. 


Create Subnet Cancel |x 


Please use the CIDR format to specify your subnet's IP address block (e.g., 
10.0.0.0/24). Please note that block sizes must be between a /16 netmask 
and /28 netmask. You can create no more than 20 subnets per VPC. Also, 
please note that a subnet can be the same size as your VPC. 


VPC: vpe-fc6d1797 (10.0.0.0/16) ~ 
Availability Zone: No Preference + 


CIDR Block: (e.g. 10.0.0.0/24) 


(Gane | [Yes Greate 


Figure 3. Create Subnet 


Create Route Table Cancel x 


A route table specifies how packets are forwarded 
between the subnets within your VPC, the internet, 
and your VPN connection. 


VPC: vpe-fc6d1797 (10.0.0.0/16) + 


Figure 4. Create Route Table 
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Figure 5. New Route Table Rules 
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You can find available FreeBSD AMIs on his website at http:// 
www.daemonology.net/freebsd-on-ec2/. |n this step we will: 


e Launch a FreeBSD instance and setup its security 
group, 

¢ Disable source/destination checks (otherwise, NAT 
will not work), 

¢ Assign an Elastic IP address (EIP) to the NAT in- 
stance. 


Task 1. Launch a FreeBSD instance. 


¢ Select the “EC2” item from the “Services” menu, and 
then click on the “Launch Instance’ button. 

¢ Choose the “Classic” wizard. 

¢ Click on the “Community AMIs” tab, type the desired 
AMI from the http:/www.daemonology.net/freebsd- 
on-ec2/ website in the “Search” field (for example, 
ami-1d4c9874 to install FreeBSD 9.0 RELEASE amd64 
in US East, North Virginia), and then press ENTER: 
Figure /. 

¢ | will install FreeBSD 9.0 RELEASE amd64/HVM as 
this was the latest release version at the time of this 
writing. 

¢ On the next wizard step choose the instance 
type, click on “VPC” and select your public subnet 
(10.0.1.0/24) from the list. Click “Continue”. 

¢ Select “2” in the “Number of Network Interfaces” 
drop-down list, and assign IP addresses to each in- 
terface: 


Routes Associations | Route Propagation 


Subnet Actions 


subret-e183298a (20.0.1.0/24) | Disassociale | 


Select a subnet 


The folowing subnets have not been associated with any route tables and are therefore using the Main table routes: 
* submet-97ho2ehe (10.0.2.0/24) 


Figure 6. Route Table Associations 
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Figure 7. FreeBSD Community AMIs 
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¢ 10.0.1.10 for etno (Subnet 10.0.1.0/24), 
¢ 10.0.2.10 for etni (change subnet to 10.0.2.0/24). 
e You can assign secondary IP addresses for each in- 
terface if needed. Then click “Continue” button twice. 
¢ Name your instance and then click “Continue”. 
¢ Create a new key pair or choose the existing one, 
and then click “Continue”. 
¢ Create a new security group (Figure 8): 
¢ Open port 22 for SSH (TCP), 
¢ Open port 1194 for OpenVPN (UDP). 
¢ Click “Launch”. 


Task 2. Disable source/destination checks on the 
NAT instance’s network interfaces 

By default, each instance performs source and destination 
checking. This means the instance must be the source or 
destination of any traffic it sends or receives. For a NAT 
instance to perform network address translation, we must 
disable source/destination check on our FreeBSD instance: 


¢ Click on the “Network Interfaces” on the “Navigation” 
pane. 

¢ Right-click on each of the interfaces of your NAT in- 
stance and choose the “Change Source/Dest Check” 
item from the popup menu: Figure 9. 

¢ You should disable source/destination checks on 
each of the network interfaces of your NAT instance: 
Figure 10. 

¢ | also usually make a note of the private network in- 
terface ID (the one with IP 10.0.2.10). This informa- 
tion will be needed later. 


Task 3. Assign an Elastic IP address (EIP) 
To access your NAT instance from the Internet, you should 
assign an Elastic IP address (EIP) to your NAT instance: 


Request Instances Wizard Cuncal 4 


Choose One oF none of your existing Sequrity Groups 
2 Create a new Secunly Group 
Group Mame HaiSG 


Group betcription Fresh MAT SG 


Inbound Rules 
Create a Qasiom TOF rule = TCP 
peat Port (Service) Source Action 
Port range: . 
rear 22 (55H) 8.0,0.0/0 Delete 
Source: 0.0.00 onal 
ae Port (Service) Source Action 
Sra an | sia 0.0.0.0/0 Delate 
ai Adel Fete | 
Back | Cones . i 
Figure 8. Create a New Security Group 
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¢ Click on the “Elastic IPs” on the “Navigation” pane. 

¢ Click “Allocate New Address”. 

¢ Select “VPC” from the drop-down list and then click 
“Yes, Allocate’. 

¢ Select the allocated IP address in the list and then 
click on the “Associate Address” button. 

¢ Select your NAT instance in the “Instance” drop-down 
list and select “10.0.1.10*” in the “Private IP address” 
field. Click “Yes, Associate” button. 


Step 3. Configure route table for the private 
subnet 

We will use the main route table to configure routing for 
the private subnet: 


¢ Select the “VPC” item from the “Services” menu. 
¢ Click on the “Route Tables” on the “Navigation” pane. 
¢ Select the main route table and then add a new route: 
¢ Destination: 0.0.0.0/0, 
¢ Target: private interface of your NAT instance (with 
the IP 10.0.2.10). (We have made notes of the pri- 
vate network interface ID when we were disabling 
source/destination checks). See Figure 11. 
¢ Click on the “Associations” tab and then associate 
this route table with your private subnet (10.0.2.0/24): 
Figure 12. 


Now your Amazon VPC configuration is completed and we 
can focus on configuration of our FreeBSD NAT instance. 
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Figure 9. Change Source/Destination Checks Settings 
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Figure 10. Disable Source/Destination Checks 
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Step 4. Configure Firewall for the FreeBSD Instance 

In this step we will enable and configure PF firewall. You 

need to connect to your FreeBSD instance via SSH as 

root and then create the /etc/pf.conf file: Listing 1. 
Then enable PF firewall: 


echo ‘pf enable="YES”’ >> /etc/rc.conf 

echo ‘pf rules=”"/etc/pf.conf”’ >> /etc/rce.conf 
echo ‘pflog enable="YES”’ >> /etc/rc.conf 
ferce/tend/pt Start 


He HEHEHE 


Reconnect via SSH. 


Step 5. Perform Basic Configuration of the 
FreeBSD Instance 
In this step we will configure our FreeBSD instance: 


¢ Configure network interfaces. 

¢ Add anew user. 

¢ Configure SSH daemon. 

¢ Update ports tree. 

¢ Install openntpd. 

¢ Enable SSHGuard. 

¢ Install tools to manage ports updates and monitor vul- 
nerabilities. 


Task 1. Configure network interfaces 
Make sure that your server can act as gateway: 


# echo ‘gateway enable="YES”'’ >> /etc/rc.conf 
Configure the 2" network interface: 


# echo ‘ifconfig xnl="”inet 10.0.2.10/24”' >> /etc/rc.conf 
# ifconfig xnl inet 10.0.2.10/24 


To improve network performance, it is important to do 
the following: 


# echo ‘net.inet.tcp.tso=0' >> /etc/sysctl.conf 
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Figure 11. Main Route Table Rules 
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The following subnets have not been associated with any route tables and are therefore using the Mam table routes: 


Figure 12. Main Route Table Associations 
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Reboot the server: 

# reboot 

Task 2. Add a new user 

In order to add a new user, execute the following com- 


mand: 


# adduser 


Please do not forget to “invite” your new user to the 
wheel group. It is more convenient and secure to set- 
up SSH authorization via a DSA/RSA key than to type 
the password all the times. Let’s do it. Generate DSA 
keys: 


# su your new user 


6 ssh-keygen -t dsa 


ole 


cat ~/.ssh/id dsa.pub >> ~/.ssh/authorized keys 


Listing 1. /etc/pf.conf file contents 


Serge aml 
LONG te canes 
LOC PE = loos 
cial als Se re Ua 


mmalvolelolo eee NO e221) 


table <private subnet> { 10.0.2.0/24 } 
table <firewall> { self } 

table <openvpn clients> { 10.8.0.0/24 } 
table <sshguard> persist 

taller <q avis) 92) 163. 50.0/24 5) 


scrub in all 


# NAT 

nat on $ext if inet from <private subnet> to !<firewall> 
=> See 218) 

nat on $ext if inet from <jails> to !<firewall> -> (Sext_ 
alas) 

MeNe (Old Sake IIe sea eloeienyeim vellwsmess> Ge) Gere iene. syle 


= a (Saliaie patg) 


block log sal 

block an quick Om Sex sit Proko Lep trom <somguard- 
eo) (eric ale) ewe Gscin Icio il sila 
brUEeronee 

lolleyel< alia epbliell< ial SG Wjeim Abie je wioiele) ieee) se) Asisiolebrelaciel 
bO (Sexe aie) eo sisi Ikeloell “isa 


bruteforce” 


# OpenVPN 
Pass In On ext Ine prove Udp fromediy vo WGcxu elk) 


port 1194 keep state 


# ssh 
joel ION Ola SiSyaic ali joule) wee: ee) Sieyce az)! joerc ssla lees 
S/SA keep state 


Pass 10) O00) 2yOny is proro cp from <openpneelvenics> to 
(Sext_if) port ssh flags S/SA keep 


state 


# RDP 
Pass 10 ON [Vpn If PrOLe tcp from —Openvpn clicnrs> co 
Swin host port 3389 flags S/SA keep 


state 


# ICMP to the external IP and OpenVPN gateway IP 

Lene Gy pes = | seehored  inveacins| 

Pass in) OR) = ext lr mee proLo emp trom amy FO (>ext 15) 
MEMS MEVISE Salemle Neyees Meee siecle 

pass 1) On Svpn if dimee prove dcmp from any to (2vpm it) 


EW EVs Sakemie Teyeec kee) siecle 


# pass out this server 


pass out proto { tcp, udp, icmp } all keep state 


# pass out private subnet 

pass 1) On Simi ti prove {["ECp, udp, icmp | from 
“phivdue subneE seo. <imewall lage 
S/SA 


# pass out jails 
pass in) on -loc lf provo { Eco, ude, temp) from <jarlc- 


to !<firewall> flags S/SA 
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Then copy your private key (~/.ssh/id_ dsa) to your MIi- 
crosoft Windows workstation: 


e Print your private key in the SSH console: 
s Cat »/.ssh/id dsa.pub 


¢ Copy the printed text into a new id_ dsa file on your 
Microsoft Windows workstation. 
¢ On your Windows workstation, convert ia dsa key to 
Putty .opk without a passphrase: 
¢ Download PuttyGen from htto://the.earth.li/Y%o7Esgta 
tham/putty/atest/x86/puttygen.exe and open it. 
¢ Click the “Load” button, choose “All files (*.*)”, and 
select your id dsa key. 
¢ Click on the “Save Private Key” button. 


From now you are able to connect via SSH by using your 
private key instead of typing your password all the times. 


But before you do this, let’s change root’s password: 


% exit 


# passwd 

Task 3. Configure SSH Daemon 

Make sure that SSH daemon listens on 10.0.1.10 only. 
Add the following line to /etc/ssh/sshd_config: 
ListenAddress 10.0.1.10 


Also disable login for root user: 


-PermitRootLogin yes 


+PermitRootLogin no 

Then restart SSH: 

# /etc/rce.d/sshd restart 

Before you close the existing SSH session please check 
that you can open a new SSH session as the user you 


added in the previous task. 


Task 4. Update ports tree 
Update ports tree for the first time: 


# portsnap fetch extract 
In future you should update ports tree as below: 


# portsnap fetch update 
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Task 5. Install Openntpd 
OpenNTPD provides the ability to sync the local clock to 
remote NIP servers and can even act as NTP server it- 
self, redistributing the local clock. 

Configure ntpdate to set the clock at boot time: 


# echo ‘ntpdate enable="YES”’ >> /etc/rc.conf 


# echo ‘ntpdate hosts="pool.ntp.org”’ >> /etc/rc.conf 
Install net/openntpd: 


# cd /usr/ports/net/openntpd 
# make config-recursive 


# make install clean 


Default configuration will suffice for our purposes 
but feel free to edit /usr/local/etc/ntpd.conf if need- 
ed. Configure openntpd to start at boot time and start it 
now: 


# echo ‘openntpd enable="YES”'’ >> /etc/rc.conf 


# /usr/local/etc/rc.d/openntpd start 


Task 6. Enable SSHGuard 
SSHGuard monitors server from its logging activity. When 
logs convey that someone is trying to brute-force SSH 
password, SSHGuard reacts by blocking this host. 

Let's first install it: 


# cd /usr/ports/security/sshguard-pf 
# make config-recursive 


# make install clean 


Your /etc/syslog.conf has been added a line for SSH- 
Guard; uncomment it and then reload the configuration: 


# /etc/rc.d/syslogd reload 


This command will display the set of addresses blocked 
in the SSHGuard table at any time: 


# pfctl -Tshow -tsshguard 
Task 7. Install tools to manage ports updates and 
monitor vulnerabilities 


Install ports-mgmt/portmaster: 


# cd /usr/ports/ports-mgmt/portmaster 


# make config-recursive 


You should check the box against the pxcncpatcsx option. 
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make install clean 

echo ‘WITH PKGNG=yes’ >> /etc/make.conf 

rehash 

pkg2ng 

ep /usr/ local/etc/pkg.conf.sample /usr/local/eétc/pkg, cont 


S$ + S$ 3S = 


Edit the /usr/local/etc/pkg.cont file: 


PACKAGESITE : http://pkgbeta.freebsd.org/S{ABI}/latest 


Install ports-mgmt/portaudit: 


cd /usr/ports/ports-mgmt/portaudit 
make config-recursive 
make install clean 


rehash 


S$ + S SF = 


porlaudit =-Fda 


Step 6. Apply Security Patches and Install a 
Custom Kernel 

FreeBSD is a very secure operating system. However, 
even in this operating system people sometimes find se- 
curity holes. (The list of released FreeBSD Security Advi- 
sories can be found on the http:/www.freebsd.org/secu- 
rity/advisories.html webpage). 

In this step we will patch all security holes by updat- 
ing source and then recompiling world and kernel. We 
cannot use freebsd-update utility for these purposes 
because FreeBSD AMIs use a custom kernel. To re- 
compile the kernel from sources is the only option we 
have. 


Task 1. Synchronise Source 
Setup configuration for csup utility: 


# cp /usr/share/examples/cvsup/standard-supfile 


/etc/freebsd-supfile 


Open /etc/freebsd-supfile and change the URL of your 
CVSUP server in the following line: 


‘default host=CHANGE THIS.FreeBsD.org 

Note 

You can find the list of CVSUP servers on the FreeBSD 
website. 


Save the file and then update the source: 


# csup /etc/freebsd-supfile 
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Task 2. Apply EC2-specific Patches 

Some EC2-specific patches have been applied to Free- 
BSD in order to improve performance and work around 
bugs. These are contained in /root/ec2-bits/*.patch: 


# ls /root/ec2-bits/*.patch 
/coot/ec2-bits/blkfront.patch 
/root/ec2-bits/ec2.patch 
/root/ec2-bits/tcp mbuf chain limit.patch 


/root/ec2-bits/uart.patch 
Let's apply this patches: 


# cd /usr/src 

# patch < /foot/éc2-bits/blkftront. patch 

# patch < /root/ec2-bits/ec2.patch 

# patch < /root/ec2-bits/tcp mbuf chain limit.patch 


# patch < /root/ec2-bits/uart.patch 


Task 3. Create a Custom Kernel Configuration 

For 32-bit systems, you can find the kernel configuration 
files in the /usr/src/sys/i386/conf directory. However, we 
have installed a 64-bit operation system, so let’s use con- 
figuration files from the /usr/src/sys/amd64/conf directory: 


# cd /usr/src/sys/amd64/conf 


| always strip kernel from the drivers | do not use. You 
should copy the ceneric file into my _GENERIc and then 
comment those drivers you do not need in your kernel. 
lf you do not know what to comment, then leave every- 
thing as it is. 


# cp GENERIC MY GENERIC 
# vi MY GENERIC 


Copy xENHv» into MYKERNEL file, then edit myKERNEL file: 


# Cp XENHVM MYKERNEL 
# vi MYKERNEL 


Make the following changes in the file (remove the red 
lines and add the green lines without “+”): 


-include GENERIC 
+include MY GENERIC 
-ident XENHVM 

t+ident MYKERNEL 
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+# PF 
+device pf 
+device pflog 


+device pfsync 


Task 4. Build world and Custom Kernel 

Type the code from Listing 2. When running mergemaster 
for the first time, please select “d” (delete “temporary file”) 
for the following files: 


/etc/group 


/etc/master.passwd 


Note 
A“temporary file” is a file without your configuration chang- 
es, the one which comes with FreeBSD source. 


lf you are prompted for other files and these files contain 
some configuration changes you've done and you want 
to keep these changes, then you should also choose “d”. 
lf you want to replace your amended files with the default 
version from FreeBSD source, then select “i”. 

When you are running mergemaster for the second time 
(after make installworld), do not delete ec2 ~* files from 


/etc/rc.d/ when asked. Same logic as above applies to 
this step. Be ready to answer many questions. It will take 
a while but you will be rewarded with the new custom ker- 
nel with all security patches applied. 


Step 7. Setup OpenVPN 

OpenVPN is a robust and highly flexible tunneling appli- 
cation that uses all of the encryption, authentication, and 
certification features of the OpenSSL library to securely 
tunnel IP networks over a single TCP/UDP port. 

We will use OpenVPN on our FreeBSD NAT instance 
to make only one port open to public (UDP 1194). Once 
connected and authenticated, legitimate OpenVPN users 
will get access to all the servers behind the NAT instance, 
subject to the NAT instance’s firewall rules. 


Task 1. Install OpenVPN 
Install OpenVPN from ports: 


# cd /usr/ports/security/openvpn 
# make config-recursive 


+ make install clean 


Leave the default options when asked. 


Listing 2. Rebuilding “world” and custom kernel 


CG st sinc 

make cleanworld && make cleandir 
make buildworld 

make buildkernel KERNCONF=MYKERNEL 
make installkernel KERNCONF=MYKERNEL 
reboot 

adjkerntz -i 

fle CRS ieveicieic jo) i) 

6G /usr/ sre 

make installworld 

make delete-old 

MeLGenaste rn ==) 

reboot 

Cd sn sGe 


make delete-old-libs 


[== SS SS SS SS Ss SS SS SS Ss SS Ss Ss SS SS 


# mkdir /usr/local/etc/openvpn 


# chmod 0755 /usr/local/etc/openvpn/easy-rsa/2.0/* 


Listing 3. Script to copy the sample configuration and the scripts to generate the certificates and keys 


# cp /usr/local/share/doc/openvpn/sample-config-files/server.conf /usr/local/etc/openvpn 


# cp -a /usr/local/share/doc/openvpn/easy-rsa /usr/local/etc/openvpn 
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Listing 4. Script for generating certificate for a user 


# cp /usr/local/etc/openvpn/keys/ca.crt /usr/local/etc/ 
openvpn/easy-rsa/2.0/keys/ 

# cp /usr/local/etc/openvpn/keys/ca.key /usr/local/etc/ 
openvpn/easy-rsa/2.0/keys/ 


# ./build-key user name.your.domain 


Listing 5. OpenVPN server configuration file 


# MS Fix for Apps like Remote Desktop 
fragment 1400 


ms sfix 


# Which local IP address should OpenVPN 
# listen on? (optional) 

7 lolol e Slovicr yl 

ikoceie eh. 02 lek 


ca /usr/local/etc/openvpn/keys/ca.crt 
cert /usr/local/etc/openvpn/keys/openvpn. your.domain.crt 
key /usr/local/etc/openvpn/keys/openvpn. your.domain.key 


# This file should be kept secret 
dh /usr/local/etc/openvpn/keys/dh2048.pem 
# Push routes to the client to allow it 
# the server. Remember that these 
# private subnets will also need 
# to know to route the OpenVPN client 
# address pool (10810007 755.255.2557 0) 
# back to the OpenVPN server. 


pushes “—eoutesiOe Ole 0 2553255225 5.0 7 
pushy soutke. 10202220" 255.250.2005 e 


Certain Windows-specific network settings 
can be pushed to clients, such as DNS 

or WINS server addresses. CAVEAT: 
http://openvpn.net/faq.html#dhcpcaveats 


The addresses below refer to the public 


SS Sf Se Se Se Sis 


DNS servers provided by opendns.com. 
push “dhep-option DNS 8.6.4.4 
JOUcme Ne p-Ooelon DNS 206 so1e2 20m 2 70 


# It’s a good idea to reduce the OpenVPN 

# daemon’s privileges after initialization. 
# 

# You can uncomment this out on 


# non-Windows systems. 


user nobody 


group nobody 


# Output a short status file showing 
# current connections, truncated 
# and rewritten every minute. 


status /var/log/openvpn-status.log 


By default, log messages will go to the syslog (or 
om Windows, if running as a Service, they will go t0 


the “\Program Files\OpenVPN\log” directory). 


“log” will truncate the log file on OpenVPN startup, 


while “log-append” will append to it. Use one 


# 
# 
# 
# Use log or log-append to override this default. 
# 
# 
# or the other (but not both). 

G kere openvpn.log 


log-append /var/log/openvpn.log 


Listing 6. Configuration file for your Windows OpenVPN client 


client 

dev tun 

PROLO ude 

remote openvpn.your.domain 1194 
resolv-retry infinite 


nooind 


fragment 1400 


persist-key 


persist-tun 


Gay “dis patito directory \ea.crt™ 
cert “d:\\path\\to\\directory\\user name.your.domain. 
Cie. 


key “d:\\path\\to\\directory\\user name.your.domain.key” 


Compal zo 


verb 3 
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Task 2. 
Generate Certificates and Keys for OpenVPN Server 
Now let’s create a configuration directory, and then copy 
the sample configuration and the scripts we will need to 
generate the certificates and keys for our server and cli- 
ents: Listing 3. 

You can edit /usr/local/etc/openvpn/easy-rsa/2.0/ 
vars file if you want to change the default RSA key size 
etc. 


# /bin/sh 
# cd /usr/local/etc/openvpn/easy-rsa/2.0 && . 
# ./clean-all 


fd Tacs 


Then create the CA: 
# | /build-ca 


When asked for the “Common Name’ please specify the 
Fully Qualified Domain Name (FQDN) of your NAT in- 
stance (for example, openvpn.your.domain). 

Generate the server key file: 


# ./build-key-server openvpn. your.domain 


When asked for the “Common Name” please again 
specify the Fully Qualified Domain Name (FQDN) of 
your NAT instance. Leave the Challenge Password emp- 
ty. 

Create the Diffie-Hellman parameters: 


# ./build-dh 


This can take a while... 
Copy the files you've generated to the /usr/local/etc/ 
openvpn/keys directory: 


# mkdir /usr/local/etc/openvpn/keys 
# cp /usr/local/etc/openvpn/easy-rsa/2.0/keys/* 
/usr/local/etc/openvpn/keys 


Then clean up everything from the /usr/local/etc/ 
openvpn/easy-rsa/2.0/keys directory: 


# ./clean-all 


Task 3. Generate Certificates and Keys for OpenVPN 
Users 

Generate certificate for a user: Listing 4. When asked for 
the Name and email address, please specify user’s name 
and email address. 
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Each user certificate you create, needs to have a 
unique Common Name (CN). | usually use user name. 
your.domain format for the Common Name to ensure 
the uniqueness. Leave the Challenge Password empty. 
Execute the above command for each user which will 
be connecting to your OpenVPN server. After that you 
should copy the user certificates to the /usr/local/etc 
/openvpn/keys directory: 


# cp /usr/local/etc/openvpn/easy-rsa/2.0/keys/* 
/usr/local/etc/openvpn/keys 
# ./clean-all 


The files you need to provide to each user are: 


bd Ca.CLe 
* “user . hame.your domain, cre 


* User Name. your.domain.key 


You can transfer the files by using WinSCP in the SCP 
mode. The key file (user _ name.your.domain.key) iS accessi- 
ble only by root. So you will need to change its permissions 
to make the transfer. But do not forget to revert it back once 
you are done. The file must be readable only by root. 


Task 4. Configure OpenVPN server 

Now let’s configure OpenVPN server. Open /usr/local/ 
etc/openvpn/server.conf and make the following changes: 
Listing 5. Enable the OpenVPN server during boot-time: 


# echo ‘openvpn enable="YES”' >> /etc/rc.conf 
# echo ‘openvpn configfile="/usr/local/etc/openvpn/server. 
cont’ >> 7etc/re.cont 


# echo ‘openvpn if="tun”’ >> /etc/rce.conf 
Start manually the OpenVPN server: 


# /usr/local/etc/rc.d/openvpn start 


Task 5. Configure OpenVPN Clients 
Please find the configuration file for your Windows Open- 
VPN client in the Listing 6. 


Note 
The OpenVPN client can be downloaded from http://open- 
von.se/download.html. 


Step 8. Launch a Microsoft Windows Instance in 
the Private Subnet 

Let's launch a Microsoft Windows instance in the private 
subnet to test that our NAT instance works properly: 
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¢ Select the “EC2” item from the “Services” menu in 
the AWS Management Console, and then click on the 
“Launch Instance’ button. 

¢ Choose the “Classic” wizard. 

¢ Select “Microsoft Windows Server 2008 R2 Base 64-bit”. 

¢ On the next wizard step choose the instance type, 
click on “VPC” and select your private subnet 
(10.0.2.0/24) from the list. Click “Continue”. 

¢ Select “1” in the “Number of Network Interfaces” 
drop-down list, and setup 10.0.2.20 as the IP address 
(subnet 10.0.2.0/24). 

¢ You can assign secondary IP addresses if needed. 
Then click “Continue” button twice. 

¢ Name your instance and then click “Continue”. 

¢ Create a new key pair or choose the existing one, 
and then click “Continue”. 

¢ Create a new security group and open port 3389 for 
RDP (TCP). See Figure 13. 

¢ Click “Launch”. 


Step 9. Test Remote Desktop Connection to the 
Microsoft Windows Instance 
Now let's retrieve Windows password: 


¢ Select the “EC2” item from the “Services” menu in 
the AWS Management Console. 


Reques! Instances Wirarcd Cased E 


caneuen 7 GE adi 
inty graups determina whether a nefwerk pert ia open er blocked an your inetances, You may use an esting securby group 
Ofoee Can elo you oats a new aecunby oraua fo Sidi a60es4s bo your tans wand the 4uaensted farts below, 4d 
actehora! orts how of utkdate your Setunty Group anybne uieng be Setunty Groups page 
Choose one or merc of your existing Securily Groupes 

© Create a new Security Group 

Group Mare Fuld _Serer_3G 
Group Description Build Serer SG 
Inbound Rules 


Create a Custom TCP mile . TOP 


mea mule: 


Port (Service) 
3389 (RDP) 


Source 
Port ra 1 
ds 0.0.0.0/0 Cle be 


| SI Packed Pluie 


Back Continue a 


Figure 13. Create a New Security Group for Windows Instance 


0.0.0.0/0 Delete 


Source Action 


?? (SSH) 0.0.0.0/0 Delete 
BO (HTTP) 10.0.7.70/37 Delete 
443 (HTTPS) 10.0.7.70/37 Poeshes tes 


1194 0.0.0.0/0 


Delete 


Figure 14. Updated Security Group for NAT Instance 
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¢ Click on the “Instances” on the “Navigation” pane. 

¢ Right-click on the Windows instance and then select 
the “Get Windows Password” from the popup menu. 

¢ Select the private key and then click on “Decrypt 
Password’. 


Now you can connect to 10.0.2.20 using Remote Desk- 
top Connection. 

Please note that right now only RDC to the Microsoft 
Windows instance will work. Nothing else. If you try to 
browse the Internet or click on Windows Update, the Mi- 
crosoft Windows instance will not be able to access the 
Internet. All its outgoing traffic goes through NAT instance 
but there are no such rules in the “NatSG’” security group. 

For example, to allow HTTP/HTTPS traffic from your Mi- 
crosoft Windows instance, you should add the following 
rules to the NAT instance’s security group: Figure 14. 


Conclusion 

| hope that this article will motivate you to use FreeBSD 
instances in Amazon Cloud. The only downside of this is 
so called “Microsoft Windows tax”. You probably have no- 
ticed that all FreeBSD AMls for non-cluster instances (at 
least the ones that are not EOL yet) run as Microsoft Win- 
dows instances. As the result, you have to pay more for 
using them. (You can find out more details about this on 
the = http:/www.daemonology.net/blog/2012-01-16-Free 
BSD-now-on-all-EC2-instance-types.html webpage). 
Let's hope that in the nearest future Amazon will provide 
a mechanism for running in HVM mode without being la- 
beled as a ‘Windows’ instance. 

And once again, many thanks to Colin Percival (a Free- 
BSD developer, a member of the FreeBSD Core team, 
and the FreeBSD Security Officer) for his work to get 
FreeBSD running in Amazon Cloud. 
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This article continues the previous one, presenting the readers with 
a few index examples and how the access costs are computed by the 
query planner. All the examples shown here have been tested ona 
PostgreSQL 9.1 cluster running on a FreeBSD 8.2-RELEASE machine; 
all the example source code is available in a GitHub repository. 


What you will learn... 
« which indexes can be defined on existing data 
« how to analyze the effectiveness of an index 


be used to explain the basis of database indexes and 

how PostgreSQL performs the cost computation. It is 
assumed that the articles table has just been created and 
populated as described in the previous article in this se- 
ries and that no custom indexes have been created. 

First consider a single-column-filtered query: get all the 
articles of average difficulty (i.e., difficulty = ‘AVG’) and 
see how PostgreSQL handles it (Listing 1). 

The executor wants to perform a sequential scan (Seq 
Scan) that will cost 51667 and will extract 663933 tuples 
out of the original relation. The first step to understanding 
the generated plan is Knowing how the system computed 
the number of output tuples: the system stores statistical 
data in the catalogues and make it available through the 
special view pg_ stats (see Listing 2). Such statistical data 
is fundamental for the executor to choose the execution 


n this section a few example queries and indexes will 


Listing 1. A query when no indexes have been defined 


QUERY PLAN 


Seq Scan on articles 


Filter: (diriculity — “AVG* s:bochar) 


bsdmagdb=# EXPLAIN SELECT title, pages, listings, difficulty FROM articles WHERE difficulty = ‘AVG’; 


(cost=0.00..51667.00 rows=663933 width=38) 


What you should know... 

¢ basic shell commands 

¢ basic PostgreSQL concepts 

¢ server-side programming with PostgreSQL 


plan and the access method. In the case of the difficulty 
column the statistical data reports that there are only 3 
distinct values in the whole table and that such values are 
‘MIN’, ‘MAX’, and ‘AVG’, as expected (see the function in 
Listing 1). 

The most common vals and most common fregs are afr- 
rays whose values are related, and therefore the ‘MAX’ 
value appears with a frequency of 0.3359, the ‘MIN’ val- 
ue with a frequency of 0.332133 and the ‘AVG’ value 
with a frequency of 0.331967. This is the information re- 
quired to Know how many tuples the system is going to 
expect out of a query that filters on the difficulty = ‘AVG’ 
clause: 


OuLpPUL..tuples 
= 2000000 * 0.331967 
= 603934 


= telation votal tuples - Trequency 
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Now it is possible to compute the cost of the sequential 
scan: each data page of the target relation will be read 
with a weight of seq page cost and the system has 
to evaluate each tuple to see if it matches the selection 
condition, and therefore: 


seq. scan. total cost.= relation total pages * seq pave cost 
+ relation total tuples * (cpu tuple cost. + cpu_ 

Operacvor cost) 

Zoo) * | +. 2000000 * 


51667 


(04 OL+0 0025) 


II 


That is exactly the cost reported by the query planner. 
In order to see how things change using indexes, build 
an index on the difficulty column as follows: 


bsdmagdb=# CREATE INDEX idx difficulty ON articles (difficulty) ; 


Similarly to regular data tables, since PostgreSQL 
manages indexes as relations, it is possible to see the 
amount of space occupied by an index: 


bsdmagdb=# SELECT relname, reltuples, relpages 
FROM pg class WHERE relname = ‘idx difficulty’ AND relkind = ‘i’; 
relname | reltuples | relpages 


idx dificulty: | Zet06 | 


The index has the same tuple count of the data relation, 
and this is because the index is full (that is covers each 
data tuple). However, since the index stores a lot less 
information with regard to the data table the number of 
pages is less than that of the real data. 

As shown in Listing 3 the planner now decides to ex- 
ploit the newly created index and accesses the data ta- 
ble using such index. The first operation performed by 
the planner is a “Bitmap Index Scan’, that is a fancy 
way to say “walk the index but don't get out the tuples 
yet, just keep the pointers”; in fact the output width for 
such node is 0 (that is no tuples at all). In this first pass 
therefore the planner walks the index to see which tu- 
ples match the difficulty = ‘AVG’ clause and builds a bit- 
map of pointers to such tuples. For efficiency reasons, 
if the number of matching tuples increases, then the bit- 
map does not keep tuple pointers but page data point- 
ers, that is this pass is “lossy” and on the next step the 
planner Knows that the condition has to be checked 
again (hence the “Recheck Cond” on the second 
node). 

The second node reads the built bitmap and extracts 
the tuples out of the data table (the width now is not ze- 
ro): it is interesting to note that the rows number of the 
two nodes are exactly the same (as expected). It is pos- 
sible to compute the costs manually following similar 
steps as in the previous case: 


Listing 2. Listing 2. Statistical data used by the optimizer 


bsdmagdb=# \x 


PROM pg Stats WIKRE tablename —~ anuicles “AND atunane — 
tablename articles 

attname difficulty 

in eli sic iLiavene 5 


{MAX,MIN, AVG} 
ORS Soe OES 8) Zi se) We seul oie 7 3, 


most common vals 


most common freqs 


Listing 3. The query plan using an index on the “difficulty” column 


QUERY PLAN 


Bitmap Heap Scan on articles 
Recheck Cond: (difficulty = ‘AVG’: :bpchar) 
=7 Siu index Scan Ony Lexea 1 iiculEy 


Index Cond: (difficulty = ‘AVG’: :bpchar) 


Dediagdh=7 ShLECh Gablenane savchame,, A dl sceinel, Moss sCOmmon Vals,smose, common Erede 


Cubic uikicy, - 


bsdmagdb=# EXPLAIN SELECT title, pages, listings, difficulty FROM articles WHERE difficulty = ‘AVG’; 


(cost=10977.91..45944.07 rows=663933 width=38) 


(cost=0.00..10811.92 rows=663933 width=0) 
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output. tuples = relation. total tuples * frequency 
= 2000000. = O35 106) 
= 663934 


The bitmap index scan has to traverse the whole index, 
and therefore has to read each data page of the index 
(sequentially, that is with seq _ page cost) and to apply 
a cpu operator on each of the tuples in the index; there- 
fore the results are: 


bitmap index scan = ( index total pages * random page cost 
+ index total tuples * {( cpu_index tuple cost + 
Cpu-Operator cost ) ) 


* INdex SeleCtiy iicy 


= { 4392 % 4h 2000000 * (0.005 + 0.0025 ).). * 
Os 331967 

= 1/566 + T5000 ) * 0.351967 

= 10811 


and then comes the “Bitmap Heap Scan” cost (See Box 
2), that is rounded to the cost of sequentially retrieve 
pages (skipping not-bitmapped tuples) and applying the 
filter operator (recheck): 


bitmap heap scan = ( relation data pages * seq page cost }) 


+ | Pelation tuples * index selectivity * (cpu. 


tuple. Cost. + Cpu operator cost ) 

( 26007 * 1) » 4 2000000 * 0l3319G7 * 4.0.01 
- sO025 ). “) 

26001 + 6299 

= 34966 


Note that the above is the cost of the “Bitmap Heap 
Scan” node which, as shown in Listing 3, has a final cost 
of around 45944 and an initial cost of around 10977 and 
therefore a single run cost of 45944 — 10977 = 34967 
(some rounding in the results is acceptable). It is inter- 
esting to note that the initial cost of the Heap Scan node 
is not exactly the same of the final cost of the Index 
Scan node but is slightly higher; this is due to the sort- 
ing of the obtained bitmap and to the re-arrangement of 
the in memory structures so that the next node can start 
running. 

Why, having an index on difficulty, does the system not 
use it to immediately retrieve tuples out of the target re- 
lation but performs this “fancy” two step algorithm? The 
idea is the index is not selective enough to justify a pure 
index scan (with the cost of random page cost) and that is 
better to build a pre-ordered map of which data pages are 
required and to access them sequentially. 

Supposing that the difficulty = ‘AVG’ is the normal data- 
base ‘workload’ (i.e., a query that needs to be optimized 


Listing 4. Creation of a partial index to match the workload query 


bsdmagdb=# CREATE INDEX idx difficulty avg 

ON artteles(diinculty) 

WHERE difficulty = ‘AVG’; 

bsdmagdb=# SELECT relname, reltuples, relpages 

PROM egeclass 

WERE eselmeane == 20x eda ticity save’ SAN) welkind = — 177 
relname | reltuples | relpages 


Ikebe einewl ley elie) 666666 | 


Listing 5. Executing the workload query with the partial index 


QUERY PLAN 


Bitmap Heap Scan on articles 


= Silvia Index scan omidx di ticnley ave 


Index Cond: (difficulty = ‘AVG’ ::bpchar) 


bedmagdb—7 WPDATE = pGlinde< sei indusvalid — false WHER andexrelid — “dx jdiGieuity “.egcelass, 


bsdmagdb=# EXPLAIN SELECT title, pages, listings, difficulty FROM articles WHERE difficulty = ‘AVG’; 


(cost=11031.27..46046.61 rows=667867 width=38) 


(cost=0.00..10864.30 rows=667867 width=0) 
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since it is executed very often), then it is possible to im- 
prove performance by building a partial index that covers 
only this exact condition. Listing 4 shows the creation of 
the partial index and the size of the results: it is interesting 
to note that the index has now 1466 pages (against the 
4392 of the full one) and a lot less tuples. 

Listing 5 shows the execution plan to access the same 
data (i.e., difficulty = ‘AVG’) with the partial index in place. 
As readers can see the query plan has not changed, that 
is again a two step plan is selected, with a slightly differ- 
ent cost with respect to the previous case. The cost can 
be computed in a similar manner to the previous case, 
without having the index selectivity in the first node com- 
putation (since the index is 100% selective); what chang- 
es with respect to the previous case is the cost of the first 
node (Bitmap Index Scan): 


( 1406 * 4 + 66606606 * ( 0.005 + 0.0025 ) ) 


= ( 5864 + 5000 ) 
= 10864 
bitmap heap scan = { relation data pages * seq page cost }) 


+ | Teleation tuples-* index selectivity * {1 epi. 
tuple Cost + €pu operator cost } 

( 2ooo7*.— .) = 4¢ ZO00000 @ OU. 331967 * ¢ O.01 
+ O.0025.)) 

26007 + 8299 


34966 
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Therefore, making a partial index on this particular query 
does not improve performance, and even if the resulting 
index is smaller, it still does not filter enough to be cho- 
sen as direct access method. 

The Bitmap Index Scan is often better than a pure Index 


scan, since the latter could imply reading the index ran- 
domly more than one time, while the Bitmap Index Scan 
sorts the bitmap in order to allow for sequential data re- 
trieval. Therefore the Index Scan is used for very selective 


bitmap index scan = ( index total pages * random page cost 
+ index total tuples * { cpu. index tuple cost + 


Cpu operator cost.) ) 


Listing 6. An Index Scan fora single row query retrieval 


bsdmagdb=# EXPLAIN SELECT title, abstract FROM articles WHERE pk = 1; 
QUERY PLAN 


index ssecan Using "ateteles okey sOneannre les sicose—0 00 ota rows wich o7 ) 


Index Cond: (pk = 1) 


Listing 7. A two-tuples query that uses a Bitmap Index Scan 


bsdmagdb=# EXPLAIN SELECT title, abstract FROM articles WHERE pk = 1 OR pk = 1000000; 
QUERY PLAN 

(cost=8.87..16.85 rows=2 width=62) 

((pk = 1) OR (pk = 1000000) ) 

(cost=8.87..8.87 rows=2 width=0) 


Bitmap Heap Scan on articles 
Recheck Cond: 
= lsulieinte Oe 

>> bi tiapmindex scan On articles pkey (cost — 500 3.443 novs—! widin—O) 

Index Cond: (pk = 1) 

= Slide index Sean On cds ule lesuekey ss (cose—) U0. 4 43. rows i waden—0)) 


Index Cond: (pk = 1000000) 


Listing 8. A query similar to the previous one but with a different set of clauses 
bsdmagdb=# EXPLAIN SELECT title, pages, listings, difficulty 
FROM articles WHERE difficulty <> ‘MIN’ AND difficulty <> ‘MAX’; 
QUERY PLAN 
(cost=0.00..56667.00 rows=883552 width=38) 
(de tieulty <> “MIN + 2bochar) ANDS (dithculbiy <— “MAX y:bpeltar)) 


Seq Scan on articles 


Filter: 
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queries (just a few rows). As an example consider a query 
that retrieves exactly one tuple: since the system builds 
a BTree for the primary key, the query of Listing 6 shows 
an Index Scan. The cost can be roughly computed as a 
random page cost for the index page and the data page, 
and therefore almost 8. 

As an example, consider the query of Listing 7 that asks 
for two different rows via the table primary key: the optimiz- 
er chooses to perform a Bitmap Index Scan instead of an 
Index Scan. In such case the cost of the nodes can be com- 
puted in a very simple manner: having to access two dif- 
ferent index entries on different index pages cost random __ 
page_cost times two, and then comes the cost of retrieving 
the two data tuples (which could be in different pages), and 
therefore the overall cost is that of accessing four data pag- 
es at a random cost, and therefore 4 * 4 = 16. Then there 
is a little addition for the CPU costs and the need to do the 
OR between clauses, as shown in Listing 7. 

It is interesting to note that the query planner can be 
confused by having the same query run with the wrong 


set of clauses: consider the plan for the query of List- 
ing 8 that performs the same as the one of Listing 5 but 
without using the same indexes. The planner does not 
know that the {MIN, MAX} values are exclusive with re- 
gard to the {AVG} ones, and therefore performs a se- 
quential scan with the already explained cost. This ex- 
ample emphasizes how often the DBA work is not only to 
find out the best access method to the data, but also to 
rewrite a “bad” query into one that the planner can bet- 
ter understand. 

In order to explore more the PostgreSQL planner and 
index tools consider the creation of an index on the list- 
ings column and the query for all the tuples with the listing 
= 5 clause (with a frequency of 0.036633); this is a case 
very similar to the previous ones and as shown in Listing 
9 the resulting index and value frequencies lead to a Bit- 
map Heap Scan. 
bitmap index scan = ( index total pages * random page cost 


+ Index total. tuples ™ { pu index Tuple. Cost: + cou 


Listing 9. Creation of an index on the listings column 


bsdmagdb# CREATE INDEX idx listings 
ON -artireles( listings); 
bsdmagdb=# SELECT relname, reltuples, relpages 
PROM pg class 
(GUE OF nafedliere iit = Sallebc ) Ikatisjwabione fs. / JANI ie(syL<sLigvel) 9 15! 
relname | reltuples | relpages 

eee tensa Se ae ee aa ee aS 

rox lvstanes || 


bsdmagdb=# \x 


Oi 


igi elausic LiaveNc 


most common freqs 


bsdmagdb=# \x 
QUERY PLAN 


Bitmap Heap Scan on articles 
Recheck Cond: (listings = 5) 
=) Biviaee Index Scan som id jive rings 


index Cond; (lvstings =.) 


PROM eg ystats With tablenames—— artrcles SANDRatuname = 

a SECORD i a———— lca a ac a a ee ari EL 
tablename | articles 

attname Pears mes 


bsdmagdb=7 SELECT tabléname, attname, n distinct, most common vals, most common freqs 
listings’; 


OE, Seoiio MUP MC Ses Ooo) ees q IGloy (0) (0). 7 ILS Sis) 90) Cerone Oe OS TolOG 1 31) OS toGsio sya Gio4t IO) OS GIRO Wel eysio) 0 07 10 


(coc — ly Ae COS icmtewe— (570 / Wicima—o7) 


(cost=0.00..1193.93 rows=73267 width=0) 
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operator cost ) ) * index selectivity 
(aoo2 * 4 + 2000000. © 4° 0.005: +0 s0025. ) .). * 

0.036633 
( 17568 + 15000 ) 
= 1193 


II 


* 0 3036633 


bitmap heap scan = ( relation data pages * seq page cost }) 
+ ( relation tuples * index selectivity * {-cpu_ 
tuple cost + Cpu operator cost ) 
( 26067 * Lj) + { 2000000 * 0.036633 * ( 0.01 
+ 0025 Jo) 
26667 -+ 916 
= 27583 


total cost = 27583 + 1193 = 28776 


II 


How does the plan change if in a single query there 
are both clauses listings = 5 and difficulty = ‘AVG’? The 
planner decides to use a direct index access, as shown 
in Listing 10. Why does the planner not use a Bitmap 
Heap Scan on one of available indexes? If the plan- 
ner does, the index with the minimal cost would be the 
idx listings, which would have a total cost of near 
28776; then the cost for the filtering condition difficulty 
= ‘AVG’ has to be applied, and this results in a CPU op- 
eration applied on each output tuple of the index above, 
therefore increasing the cost of the whole access as 
follows: 
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bitmap heap scan = 20176 + (© total data tuples *andex — 
selectivity * cpu. Operator cost ) 
2o716* 4. 2000000 * 0.036633 7 0.01 -) 


29508 


II 


which is greater than the cost of a pure Index Scan 
(28188). The latter can be approximated to the worst 
case that is: 


index scan = random page cost * total index pages 
7 \ total andex tuples “cpu Tuple index cost: | 
= { TOtal data tuples * index selectivity * ( 
cou-Ttuple cost .) 
+ Cpl Operator cost. * number of clauses ) 
= L7Ss60 + 10000 + 1006 
28666 


II 


It is possible to create a compound index that filters on 
both the clauses, as shown in Listing 11. Supposing a 
Bitmap Heap Scan, the cost of using such index for the 
query of Listing 10 would result as follows (note that the 
selectivity is total and that the index can be read sequen- 
tially): 


bitmap index scan = ( index total. pages. * seq page cost 


+ Index total tuples. * ( cpu 1ndex. tuple. cost + 


Listing 10. A query that has both the clauses 


FROM articles WHERE difficulty = ‘AVG’ and listings = 5; 
QUERY PLAN 


index Gan Using dx vo Titcwtysa ye. Onmatrle kes 
Index Cond: (difficulty = ‘AVG’ ::bpchar) 
Filter: (listings = 5) 


Listing 11. Creation of a compound index 


WHERE listings = 5 AND difficulty = ‘AVG’; 
bsdmagdb=# SELECT relname, reltuples, relpages 
PROMpegeclass 


relname | reltuples | relpages 
Se Se SS Se SSeS eS SS SSeS S SS Sa SS Sea SS Se +—--—----——-—-—--—-—-+4---------- 


1d x casiicwlizy vayg listings so: )| 


bsdmagdb=# EXPLAIN SELECT title, pages, listings, difficulty 


(cost=0.00..28188.14 rows=24561 width=38) 


Dediagdo= 7 (Cha Alh sar Gao aiOu ny pave liseings sO a nede les (ciminCUl ay, msm ims) 


WHERE eelneane== Hox cd iiculty tay listings) 3”) ANDerelkmnd =) “i 


www.bsdmag.org 


BSD :: 


MAGAZINE 


HOW TO 


2 * cpu_operator cost ) ) Therefore the total cost is nearly the same as the one 
= (185 * 1 + 66664 * ( 0.005 + 0.0025 ) ) obtained using another of previous indexes, and there- 
= ( 185 + 333 ) fore it is not worth creating this index for this data set. 
= 516 This is confirmed by Listing 12, that shows how the plan- 

bitmap heap scan = ( relation data pages * seq page cost ) ner decides to use the normal index instead of the com- 
+ ( index tuples * ( cpu tuple cost + 2 * cpu_ pound one and how forcing the usage of the compound 
operator cost ) index does not provide a great improvement in perfor- 
= ( 26667 * 1) + ( 66664 (0.01 + 2 * 0.0025) ) mance. 
= 26667 + 1000 Plans with a single relation can be easy enough to read 
= 27667 and understand, but as more tables are involved in a set 
total cost = 27667 + 518 = 28185 of joins the plan can become very complex. In order to ex- 


Listing 12. The cost of the access with the idx_listings and idx_difficulty_avg_listings_5 indexes 


bsdmagdb=# EXPLAIN SELECT title, pages, listings, difficulty 
FROM articles WHERE difficulty = ‘AVG’ and listings = 5; QUERY PLAN 


index ‘Can Using Vdx TOT iticuliysayvg "On areLteles (cost—0 00] = 7olee ss nows—24 560i widith—33) 
Index Cond: (difficulty = ‘AVG’: :bpchar) 
Filter: (listings =. 5) 
bsdmagdb=7 set enable indexscan TO false; 
bsdmagdb=# EXPLAIN SELECT title, pages, listings, difficulty 
FROM articles WHERE difficulty = ‘AVG’ and listings = 5; 
QUERY PLAN 


Bitmap Heap Scan on articles (cost=528.01..28100.18 rows=24561 width=38) 
Recheck Cond: ((difficulty = ‘AVG’::bpchar) AND (listings = 5)) 


=) Shemape inde c scan Om, td ci bic Saye liordncc yo me (cCct—U, (0K 2 hi) Veen c—24 joi) walehein— 0) 
Index Cond: ((difficulty = ‘AVG’::bpchar) AND (listings = 5)) 


Listing 13. A join query plan 
bsdmagdb=# EXPLAIN SELECT al.title, al.listings, al.difficulty 
FROM artichkes al JOIN articles a2 ON alispk = az. pk 
WHERE al.difficulty = ‘AVG’ 
AND ag li Sstumegs = 95; 
QUERY PLAN 
Hash Join (cost=29961.92..72477.02 rows=24562 width=34) 
Hash Cond: (alzok = azepk) 
a ides scaly Us tig tdx di Eiculiylayg one amtrcles tale (cost—0r 00 e226 sri) erows—o 0267] widthn—3 6) 
index (Conds (datiiculty = “AVG =: bpchar) 
-> Hash (cost=28795.08..28795.08 rows=7/73267 width=4) 
-> Bitmap Heap Scan on articles a2 (cost=1212.24..28795.08 rows=73267 width=4) 
Recheck Cond: (listings = 5) 
=) Silvia Index sean on id isting s. (cost —0 0025 )32 3 rows—)15207 wiadtun—0) 
Index Cond: (listings = 5) 
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plain some join concepts consider a query that joins the 
articles table against itself in order to get all the articles 
with a difficulty = ‘AVG’ and a listings = 5 as shown in 
Listing 13. The plan starts from the Bitmap Heap Scan 
(or better from the Bitmap Index Scan subnode) on the 
idx_listings index for the relation a2 in order to get all the 
listings = 5 tuples; the cost is the same computed in the 
previous examples for this kind of plan. The Bitmap Heap 
Scan produces a set of tuples made only by the attribute 
of the join clause, that is the a2.pk attribute (as confirmed 
by the length of 4 bytes, an integer value). The keys are 
then placed into an hash map in a similar way to what 
happens during a Bitmap Heap Scan (first phase). At the 
same time, on the relation a7 an Index Scan is performed 
to extract all the difficulty = ‘AVG’ tuples; the cost is the 
same as computed in the previous examples. The out- 
put of this node is then compared with the whole hash 
map computed at the previous step in order to see if the 
tuple can be reported in output or not (See Figure 1). The 
cost of the whole plan depends on the work_mem size 
that should be large enough to keep in memory the whole 
hash map, otherwise multiple “batch” passes are required. 


Statistics Target 
The statistical data used by the query planner is computed 
depending on a configuration parameter called statistics 
target or statistics for short. Such parameter expresses 
how many data samples must be kept in the per-column 
Statistics and therefore how the sample itself results accu- 
rate. Having a good statistical data is fundamental for the 
planner to take the right decision. 

In order to better explain how the statistics work con- 
sider the listings column of the article table. As shown 
In Listing 9 the most common vals and most common freqs 


PostgreSQL: Indexes (Part 2) 


for the listings column includes all the ten values with 
their distribution, and this is the result of having a statis- 
tics greater or equal to the number of possible values. By 
default the statistics on each column is set to a value of 
100 (since PostgreSQL 9), that means that no more than 
100 samples of data will be kept. It is possible to change 
the value of the samples to be kept using the aLTER TABLE 
ALTER COLUMN Command, as shown in Listing 14. For in- 
stance, setting the statics target to 5, as shown in Listing 
14, produces the effect that the most common values and 
most common freqs Of the listings column become a five- 
entries array therefore excluding some of the values that 
exist in the data set. 

The side effect of this wrong statistical target is that a 
query plan could lead to incorrect results: the same query 
of Listing 9 (with the clause difficulty = 5) is still execut- 
ed via the index idx_listings but the number of rows ex- 
pected as output is very different from the real value (as 
shown in Listing 15): the planner believes that no more 
than 10240 rows match the clause, while the execution 
shows that 73309 rows do. As readers can imagine, hav- 
ing wrong statistical data will lead to a wrong query plan 
estimation that can lead to the wrong access path selec- 
tion and therefore a non-optimal execution. 

Of course it does not make sense to increase the num- 
ber of statistical targets over the number of distinct values, 
and therefore for the listings column having more than 
101 values does not affect the query planning but could 
affect the time of the vacuum anaLyze and other anatyze 
commands. 


When and Where Placing an Index? 
The most difficult job of a DBA is to understand when and 
where an index (of any kind) can improve query execu- 


articles 


idx Listings 


a je 
t 
ae 
Lj! & 
oo 


Hash Join 


zn 
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® 
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idx difficulty avg 


Figure 1. A visual representation of the Hash Join (from PgAdmin III) 
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tion. It is worth reminding that each index requires main- 
tenance and can slow down vupDATE, INSERT @Nnd DELETE 
commands that need to modify such index. Therefore not 
used or not useful indexes should be removed from the 
database. 

The first step in deciding when and where an index 
could be required is to identify slow queries, or better, 
queries that can slow down the whole application. In 
fact, it could be that a query that is executed quite fast is 
repeated so many times on different data (not cached) 
that the whole application performance results bad. The 
database administrator has to “observe” the database 


while it is working to get the most frequent queries or 
those that are too slow. There are different tools, like 
pgfouine (see Box 3), that can help the DBA analyze the 
database logs and report which queries are most often 
executed. 

Out of the box PostgreSQL provides a configuration pa- 
rameter, log min duration statement, which can be used 
to log any query whose execution time requires more than 
the specified number of milliseconds. For instance setting 
the following: 


log min duration statement = 1000 


Listing 14. Decreasing the statistcs target 


bsdmagdb=# \x 


bsdmagdb=# VACUUM FULL ANALYZE articles; 


FROM articles WHERE listings = 5; 
QUERY PLAN 


Bitmap Heap Scan on articles 


Recheck Cond: (listings = 5 


=> Bitter index sean Cnid (iistings 
Index Cond: (listings = 5) 


Listing 16. An example of report on index usage 


PROMPeg estat a tiindexes 


bsdmagdb=# ALTER TABLE articles ALTER COLUMN listings SET STATISTICS 5; 


bsdmagdb-=7 SELECT tabléename, altname,;, nm distinct, most common vals, most common freqs 


PROMP eg estates Wikre rablename ——s articles ~ANDPatiname ——  listance | 
Se COR Da We amo ae ae oS ee ee Bae De ae a a ee a ee 
tablename articles 

attname ILabene ahiale(s 

idl elaesic Liavene Oe 

most common vals POn oy Syl co 

Most (COMmMen TEneds OP 57 A677, OOS 83007) Um 0 SiO 37 65330037 1333% 


Listing 15. A query plan that has the wrong number of output rows due to wrong statistical target 


bsdmagdb=# EXPLAIN ANALYZE SELECT title, listings, difficulty 


(cost=171.79..20035.29 rows=10240 width=34) 
(actual time=124.177..1872.909 rows=73309 loops=1) 


(cost=0.00..169.23 rows=10240 width=0) 
factual time=106.112..106.112 rows=73309 loops=1) 


Sabet) anidexgelmane sidx sean, tex stlp read, 10x tue) mete 


WHERE relname = ‘articles’; 

~------------------------------ $----------}--------------4--------------- 
LOXNe PENCUlGY cavg | | 547 T5o 1 | SA 7a S ot 
idx listings | 9 | 440064 | 3 
1dx ya Bicuiy vavg lustangs 5) | Oy) 0 | 0 
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Box 1. Query hints 

PostgreSQL does not allow DBAs and users to tell the executor 
which path to choose in order to perform a particular query. This 
kind of executor suggestions are known as “query hints” and are 
usually available on commercial databases. The reason why Post- 
greSQL does not support them is that usually query hints are used 
in a very bad way: typically a DBA hints the executor to use an index 
because he thinks that such index will solve any performance prob- 
lem, but it turns out that the optimizer is better at doing the path se- 
lection job. Nevertheless, while it is not possible to hint the executor 
to use a specific index, it is possible to force it to not use an access 
method. This behavior is obtained using the tunables enable _ 
xxx that can be found in the postgresql.conf configuration file and 
that can be set also on the psql command line as follows: 


secinecicl=7 Sm eoelole Sscecem = Yomi’: 
gscineciclo—=2 SM) Seclols Seeeceil = Yew’ ¢ 


The available access method tunables are the following: 


enabile bimmapsean 
enable hashagg 


enabler tach join 


enable windexs¢an 


enablewierer ial 


enablemiengejoin 
enable neseloop 
enaolewseqsceam 


emelo ley eyeucie 


enabler tidscan 


Box 2. How is a Bitmap Heap Scan cost 


computed? 

Assuming the simplest case of a single index scan (i.e., no loops), the 

cost of a Bitmap Heap Scan is computed pretty much as follows: 

¢ compute the tuplesFetched on the data table as 
tuplesFetched =reltuples x indexSelectivity 

¢ compute the maximum number of pages of the data table T 
(one or greater); 

¢ compute the number of pagesFetched using the Mack- 
ert and Lohman formula, so that if the number of tuples be- 
comes greater the number of relation pages dominates, oth- 


erwise the number of tuples dominates: 


(2XT XtuplesFetched) 
i. dill (2 T+tuplesFetched) 


¢ round pagesFetched so that it is not greater than T; 

- if pagesFetched is less than 2 (very small number of tuples, 
that is the number of pages are dominating) the data page 
access cost is randomPageCost, otherwise the data page ac- 
cess cost is computed with the following formula that has the 
lower limit at seqPageCost: 


costPerPage = randomPageCost —(randomPageCost — 


0.5 


pagesFetched ) 

a 

- so that the final I/O cost is done by the number of pages- 
Fetched multiplied by the costPerPage and the overall cost is 
done by the I/O cost and the CPU cost: 


seqPageCost)X( 


totalCost =costPerPage X pagesF etched 
+(tuplesFetched X(cpuTupleCost +cpuOperatorCost: 


numOfWhereClauses )) 


For more info see the source code of the function cost bitmap _ 


heap scan in the file src/backend/optimizer/path/ 


COSstei Ze rc, 


Box 3. PgFouine 

PgFouine is a tool that helps administrators find problematic 
queries exploiting the database logs; in fact the project defines 
itself as a “log analyzer”. The software can be installed from the 
database/pgfouine ports or as a package as follows: 


# pkg add -r pgfouine 


It is important to instrument the PostgreSQL cluster to log data in 
a way that PgFouine can understand; at the time of writing PgFo- 
uine accepts logs from syslog, stderr and in CSV format. An easy, 
even if not recommended, setup of PostgreSQL for a quick use of 
PgFouine is as follows (file postgresql.conf): 


Tog) eters tide cao = es reir! 
legging eCollectous— son 

silene mode — on 

leg) nice tomy = yore Wee 
LOG mil enane= = Pesegresd log” 
Fog min yelukeainvom Weearemen: ——() 
Ogee meno rei) cumc) | loi 

that will produce a file SPGDATA/pg_log/postgresql.log with 
all the executed statement. It is then possible to analyze the re- 
sult using PgFouine as follows: 


> /usr/local/bin/pgfouine -file $PGDATA/pg_ log/postgresql.log 
-format text -logtype stderr 


which will produce a text output similar to the following 
##¢#¢# Overall statistics ##### 


Number of unique normalized queries: 2 


Number of queries: PZ 
Total query durauvon: 5:85 
t#tee Queries by type ##### 


SHiEC Ws WZ 1005103 

t##ee Most frequent queries (N) ##### 
SOs ohn er COUNEC | EROMemagaa aie; 

De OOS OU C lec COUNT (s) MEROM tart iekec. 


ama Tat tH 
L286 = 2 >> SELECT COUNT (*) EPROM arrvclics: 
O00 == 0 = SELECT “COUNT (2) 


Slowest queries (N) 
FROM magazine; 


The report from PgFouine shows that, since logging has been ac- 
tivated, the system has executed 12 SELECT statements and re- 
ports which ones were the most frequent and which the slowest. 
This information is very helpful for DBAs in order to catch proble- 
matic queries and their frequency. 


www.bsdmag.org 
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On The Web 


PostgreSQL official Web Site: http:/www.postgresgl.org 
ITPUG official Web Site: http://www. itpug.org 


PgFouine: http://pgfouine.projects.postgresql.org/ 


while produce a log entry like the following: 


LOG? duration? 1418.252 ms 
listings, difficulty FROM articles WHERE listings = 5 AND 
difficulty = ‘AVG’; 


statement: SELECT title, 


Depending on how fast such a query should run, the ad- 
ministrator can inspect further to see if an index can be 
used to improve the query execution speed. 

Having identified slow or problematic queries will let 
the administrator take decisions on which indexes to 
build. Here come into play all the other commands, like 
EXPLAIN and ANALYZE (to adjust Statistical data). It is pos- 
sible to disable one access method (see Box 1) in or- 
der to test if an index of a particular type can provide 
a real performance gain or to see why it is not used 
(i.e., the cost of using such index). Moreover, it is pos- 
sible to inspect the pg stat all index catalog to get in- 
formation about which indexes are used and how from 
the running system. Listing 15 shows an example of in- 
dex reporting through the pg _ stat _all_ indexes where it 
is possible to see that the partial index idx difficulty 
avg_listings 5 has never been used since its creation 
and therefore could be removed without altering actual 
performances. 


Note 

Please note that values extracted from pg stat all_ 
indexes Could be different depending on when the statisti- 
cal data is extracted and which queries have been execut- 
ed against the example data table). 


The idx scan column reports the total Index Scan, 
the idx tup fetch is the number of tuples retrieved 
through the index, and idx tup_ read Is the index tu- 
ples read by index usage. All the numbers are abso- 
lute, that is are counting since the index creation. For 
instance, as of Listing 16, the idx listings has been 
used mainly for Bitmap Index Scans, since the idx _ 
tup _fecth IS almost zero and the idx tup_ readisa lot 
greater; in other words the index has been walked but 
no data tuple has been retrieved directly using index ref- 
erences. On the other hand idx difficulty avg has 
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PostrgeSQL Explain Documentation: http:/www.postgresql.org/docs/current/static/sql-explain.html 
GitHub Repository containing the source code of the examples: https://github.com/fluca1978/fluca-pg-utils 


been used for Index Scans, since the number of index 
tuples (i.e., index walking) and the number of data tuples 
retrieved coincide. 


Summary and Coming Next 

This article completed the glance at indexing and cost 
computation. Main tools to understand which decisions 
the query planner took to execute a query has been 
shown. It is worth noting that finding the query optimiza- 
tion problem is very complex and hard to solve, and ex- 
perience is invaluable in this scenario. In the next articles 
other PostgreSQL features will be shown. 
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FreeBSD Enterprise 


Search with Apache Solr (Part 3) 


Continuing with our series on Apache Solr we will look at 
crawling the bsdmag.org website with nutch. 


What you will learn... 
¢ How to set up Apache Nutch to crawl websites and inject the con- 
tent into a MySQL database ans search with Solr 


ne of the important facets of enterprise search is 
C) to be able to search internal (Intranet) and exter- 
nal websites. On a smaller scale, it is relatively 
trivial to assemble some code in PHP or Perl to pull web 
pages from a site, extract the links from the HTML and 
then “wash, rinse, repeat”. The difficulty arises when we 
want to index, rank, and effectively manage these results 
on a large scale. Almost 10 years ago, Apache Nutch was 
developed as the key technology to crawl 100 million web- 
pages, and has proved time and again that it is an efficient 
scalable solution. Nutch can be clustered, it is robots.txt 
friendly, and using modular plug-ins and schemas, can be 
tuned to bias certain results first. While Nutch integration 
and tuning is quite specialized, it is fairly trivial to config- 
ure Nutch to dump results of a crawl session into MySQL 
(or any other JDBC based database for that matter), and 
rank / review these queries in Solr. 
For more information about Apache Nutch, please see 
the Nutch Wiki, which can be found at http://wiki.apache. 
org/nutch. 


Requirements 

We will continue to develop our Solr installation from the 
last article. Another core will be added, and test data im- 
ported from a MySQL database. While | will be using the 
vi editor, substitute the editor of your choice if preferred 
(e.g. emacs, nano etc.). 
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What you should know... 
¢ BSD administration skills, FreeBSD Apache Solr Part 1 and Part 2 arti- 
cle 


Step 1. Get the Files 


S. 30 


# pkg add -r apache-ant 
If you see: 


pko-add: warning: package “apache-ant=-1.8.2 1” requires 

‘javavmwrapper-2.3.5’, but ‘javavmwrapper-2.4’ is installed 

pkg add: can’t open dependency file ‘/var/db/pkg/jdk- 
1.6.0.3p4 27/+REQUIRED BY’! 


dependency registration is incomplete 
This dependency error can be ignored since apache ant 


runs under javavmwrapper-2.4. 
Install wget and bash if they are not already installed: 


# pkg add -r wget 
# pkg add -r bash 


Download Nutch: 


=e 


mkdir /tmp/nutch 


Se 


cd /tmp/nutch 


Se 


wget http://mirror.rmg.io/apache/nutch/2.1/apache- 


NvcCh=Z else. tar o2 


=He 


tar “xXvZi apecne-nuten-2.1-sre.tar.g2 
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Step 2. Configure MySQL 


# /usr/local/etc/rce.d/mysql-server stop 
# cd /var/db/mysql 


Restart MySQL and check for UTF support (Figure 1). 


# mv my.cnf my.cnf-000 
# cp /usr/local/share/mysql/my-small.cnf ./my.cnf 


# vi my.cnf 
Add under [mysqld] section: 


character-set-server = utf8 
GoLlation—server=ubie unacode °c1 

# /usr/local/etc/rc.d/mysql-server start 
# mysql -uroot 

mysql> \s 


Quit from MySQL 
mysql> \q 


Create a nutch.sql command file with the following con- 
tents: Listing 1. 
Create the database used for Nutch: 


# mysql < nutch.sql 


Step 3. Configure Solr 
Stop Tomcat and create a new collection 


# /usr/local/etc/rc.d/tomcat7 stop 
# cd /home/solr 


# cp -R collection2 collection3 


Edit solr.xm1 to reflect the new collection by adding the 
following lines: Listing 2. 
Flush the index data: 


Figure 1. MySQL with UTF8 support 


www.bsdmag.org 


# rm collection3/data/index/* 
# rm collection3/data/tlog /* 


# cd collection3/conf 


Modify the following files: 


Listing 1. SQL 


CREATE DATABASE nutch; 

Ula sabhee la: 

CREATE TABLE ‘webpage’ ( 

“10d” Ivarchar(76/) (CHARACTER SEE latinlk NO NUL, 
‘headers’ blob, 

‘text’ mediumtext DEFAULT NULL, 

"Status unt (ll) DEFAULT NU, 

‘markers’ blob, 

“DarSestatus” blob, 

‘modifiedTime’ bigint(20) DEFAULT NULL, 

‘score’ float DEFAULT NULL, 

*“eEyO’ Varchar(3Z) CHARACTER SET latin! DEFAULT NULL, 
‘bpaseUrl’ varchar(512) CHARACTER SET latinl DEFAULT NULL, 
‘content’ longtext DEFAULT NULL, 

Stivbke’ “Varchar (2046) DEFAULL NULL, 

‘reprUrl’ varchar(512) CHARACTER SET latinl DEFAULT NULL, 
‘fetchInterval’ int(1ll) DEFAULT NULL, 
‘prevFetchTime’ bigint(20) DEFAULT NULL, 

‘inlinks’ mediumblob, 

‘prevSignature’ blob, 

Youtlinks’ mediumblob, 

‘fetchTime’ bigint(20) DEFAULT NULL, 
‘retriesSinceFetch’ int(1l1l1) DEFAULT NULL, 

Nene OeolcollSieelicitis' ~ lellele), 

Ysignature’ bilob, 

‘metadata’ blob, 

PRIMARY. hy ie aid") 

) ENGINE=InnoDB DEFAULT CHARSET=utf8; 


GRANT SELECL, INSERL, UPDALE VON nucch. > = LO 
“hutch € localhost “IDENT PED Bx 


“password ; 


Listing 2. XML 


<core schema="schema. xml” 
instanceDir="/home/solr/collection3/” 
name="collection3” 
contg—“solreonnc, xml” 
dataDir="/home/solr/collection3/data” 

fe 
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data=contd. xml 
This should contain the following: Listing 3. 


schema.xml 

Remove everything between the <fields> tags and de- 
lete all <copyField> configuration so that the schema.xml 
looks like: Listing 4. 


solrconfig.xml 
Modify the data import handler thus: 


<str name="config”>/usr/home/solr/collection3/conf/data- 


cong, xml</str> 


Figure 2. Solr empty collection 3 


a, 
ir = 


So 


oFodie 


Jie 


Figure 4. So/rconfig.xml for collection3 
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Ensure that the /etc/hosts file has a valid host name as- 
signed to local host otherwise nutch will fail: 


27 2020 1 localhost solr 


Reboot: 
# reboot 

Check that the hosts file is correct: 
S ping -t3 eolr 


You should get a reply. 

Check that you have a collection with no documents in- 
dexed by accessing the solr management interface with 
your browser at http://vouripaddress:8080/solr (replace 
youripaddress with the IP address of your SOLR box). 

The schema and data import handler should look like 
this: Figure 3. 


Listing 3. XML 2 


<dataConfig> 


<dataSource driver="com.mysql.jdbc.Driver” 
Url jVdbcrvmysqil:/ localhost, nurch” 
user="nutch” 
password="password” 


is 


<document mame= “webpage > 


<entity name="webpage” 
query="select ID, TITLE, STATUS, BASEURL, CONTENT,M 
ETADATA from webpage”> 
<field column="ID” name="id” /> 
<field column="TITLE” name="title” /> 
<field column="STATUS” name="status” /> 
<field column="BASEURL” name="baseurl” /> 


“ieldvecolumn= COMME snene= Comment. 7 > 


“held colunn= METADATA” name="mertadaua’ 7 > 


</entity> 


</document> 


</dataConfig> 
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Step 4. Build and Configure Nutch 


S su 


# cd /tmp/nutch/apache-nutch-2.1 


Edit the ivy/ivy.xm1 file removing comments on lines 
105 and 107 enabling MySQL support so line 105 is as 
follows: 


<dependency org="mysql” name="mysgl-connector-java” 


rey="5. 1.18" cont=}"*—Sderaulr” /> 


Edit the conf/gora.properties file and comment out the 
default Sqlstore properties, adding MtSQL as the data 
store so the file reflects: Listing 5. 

Copy the nutch defaults across for our site crawl: 


ep conf/nutch-default.xml. conf/nutch=site., xml 


Edit line 63 to reflect a sensible user agent as this can- 
not be empty: 


<value>FreeBSD Test</value> 
Build nutch: 
# ant runtime 


This will take a few minutes as the jar files etc. are down- 
loaded and nutch built. 


# cp -R runtime/local /home/solr/nutch 


# cd /home/solr/nutch 
Edit the first line of bin/nutch to match the following: 


#!/usr/local/bin/bash 


Listing 4. XML 3 


<fields> 


—/ Telds= 


<uniqueKey>id</uniqueKkey> 


<copyField source="id” dest="text”/> 
<copyField source="title” dest="text”/> 
<copyField source="status” dest="text”/> 
<copyField source="baseurl” dest="text”/> 
<copyField source="content” dest="text”/> 


<copyField source="metadata” dest="text”/> 


Listing 5. XML 4 


#gora.sqlstore.jdbc.driver=org.hsqldb.jdbc.JDBCDriver 


#gora.sqlstore.jdbc.user=sa 


#gora.sqlstore.jdbc.password= 


gora.sqlstore.jdbc.driver=com.mysql.jdbc.Driver 
Gotta. solstore. jdoc.url—jdbe-mysal: //local host +3506/nucen? 
gora.sqlstore.jdbc.user=nutch 


gora.sqlstore.jdbc.password=password 


Ciei¢iieMe= text styCe- bee general indexed tiile Toboted= thie 


<field name="id” type="text general” indexed="true” stored="true” /> 
<field name="title” type="text general” indexed="true” stored="true” /> 
<field name="status” type="int” indexed="true” stored="true” /> 

<field name="baseurl” type="text general” indexed="true” stored="true”/> 
<field name="content” type="text general” indexed="true” stored="true”/> 


<field name="metadata” type="text general” indexed="true” stored="true”/> 


multiivalticd="true /—> 


+}dOtd, s¢leuore, jdbc. Url— ,Obe:nsqlda nsolt7// localhost nuzchiest 


\ CreateDatabaseltNothxist—true 
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Add sava_ Home at the beginning of the file before 


cygwin=false 
JAVA_HOME=/usr/local 


Set up the nutch seed file (Replace http://bsdmag.org/ if 
desired): 


# mkdir -p urls 
# echo 'http://bsdmag.org/' > urls/seed.txt 
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Figure 5. Successful data import for collection3 
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Figure 6. Search query for Apache Solr 
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Figure 7. Content from http://bsdmag.org 
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Step 5. Crawl! 
# bin/nutch crawl urls -depth 3 -topN 5 > & logs/crawl.log 
Open another terminal: 


§ cd /home/solr/nutch/logs 
S su 


# tail -f crawl.log 
You should see the site being crawled: 


fetching http://bsdmag.org/magazine/1815-network-security-auditing 
10/10 spinwaiting/active, 5 pages, 0 errors, 0.3 0.2 
pages/s, 57 54 kb/s, 5 URLS in 1 queues 

fetching http://bsdmag.org/news 

10/10 spinwaiting/active, 6 pages, 0 errors, 0.2 0.2 pages/s, 
61 78 kb/s, 4 URLs in 1 queues 


* queue: http://bsdmag.org 


maxThreads = 1 
inProgress = 0 
crawlDelay = 5000 


minCrawlDelay = 0 

nextFetchTime = 1348457886871 
now = 1348457884591 
0. http://bsdmag.org/newsletter 
1. http://bsdmag.org/take-part 


When complete (About 7 minutes as we are not crawling 
with any depth) run the following: 


# mysql -unutch -ppassword 
mysql> use nutch; 


mysql> select count(*) from webpage where status = 2; 


This should show the number of successfully fetched pages. 


Conclusion 

This is a very crude configuration, but should be enough to 
get you started. More attention needs to be paid to the sche- 
ma files (in particular to clear out redundant cruft and to op- 
timize). 
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Registration is available at 


MeetBSD California 2012 promises to be an experience unlike any other. 


MeetBSD California is not your average conference - It’s a meeting of the minds from all over 
the BSD community. MeetBSD California 2012 will feature community - driven break - out sessions, 
discussion groups, and 5-10 minute “lightning talks,’ as well as longer talks from seasoned BSD experts. 
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